Keras: Model Saving is extremely slow

Created on 3 Feb 2018  路  22Comments  路  Source: keras-team/keras

I'm using Keras with Anaconda4.0.0 using Python 3.5 and Tensorflow backend.
in summary, my model consists of:
A) 16 Embedding Layers of Vocab Sizes ranging from 3 to 13, and input length=1
B) 64 Embedding Layers of Vocab Sizes ranging from 2 to 16, and input lengths ranging from 8 to 200.
C) 64 Embedding Layers of Vocab Sizes 9, and input lengths ranging from 8 to 200.
D) 64 Embedding Layers of Vocab Sizes 9, and input lengths ranging from 8 to 200.
E) 64 Embedding Layers of Vocab Sizes 9, and input lengths ranging from 8 to 200.
All embedding sizes are set to 6.
B,C,D,E are concatenated along the embedding size dimension to form 64 tensors which are EACH fed into a block which does: Time-distributed(8) / Conv1d(6,2)-Conv1d(6,2)-Conv1d(6,2) / GlobalMaxPooling / Flattened
The Flattened tensors are merged with the flattened tensors of A) and fed into a final dense layer of size 1 which is the final layer.

Total params: 45,463
Trainable params: 45,463

Trying to save the model is taking over an hour, infact I don't know how long it takes because even after an hour it had not saved. Any tips on what is going on ? Is this expected? If so how can i speed this up? As it is right now, I can barely save the model itself - so i cannot even employ any callbacks to save 'best models' along epochs.

Saving weights and structure separately also results in the same situation.
If it helps, model initialization and compilation takes less than a minute, and calculation of each batch of 256 takes less than a second. So model initialization and training is fast enough. Nothing out of this world.

Any advice would be much appreciated.

Thanks
J

To investigate

Most helpful comment

I am also having this problem -- many concatenated layers and super-slow saving to hdf5 during checkpointing. I'm using the R interface to Keras with TF

All 22 comments

I'm suffering from the same issue, saving embedding layers is what causing the problem I think.
I ran %prun in Jupyter notebook infront of fit function to find which is taking more time to run.
I ran this on Google colab with GPU acceleration(Tesla K80) without ModelCheckpoint it ran for 3seconds per epoch.

 752483 function calls (752449 primitive calls) in 3.674 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       17    2.825    0.166    2.825    0.166 {built-in method _pywrap_tensorflow_internal.TF_Run}
       17    0.257    0.015    0.257    0.015 training.py:400(<listcomp>)
   191322    0.117    0.000    0.117    0.000 {built-in method builtins.getattr}
       34    0.054    0.002    3.212    0.094 session.py:1036(_run)
   142123    0.036    0.000    0.046    0.000 {built-in method builtins.isinstance}
       17    0.032    0.002    0.152    0.009 backend.py:544(_initialize_variables)
     5899    0.028    0.000    0.113    0.000 tensor_shape.py:693(is_compatible_with)
    29546    0.028    0.000    0.028    0.000 ops.py:1623(name)
    11815    0.018    0.000    0.039    0.000 ops.py:302(name)
    23494    0.014    0.000    0.031    0.000 tensor_shape.py:381(as_dimension)
     5899    0.012    0.000    0.049    0.000 tensor_shape.py:420(__init__)
    11747    0.011    0.000    0.019    0.000 tensor_shape.py:83(is_compatible_with)
      163    0.011    0.000    0.011    0.000 {method 'acquire' of '_thread.lock' objects}
    11747    0.011    0.000    0.013    0.000 tensor_shape.py:27(__init__)
     5933    0.010    0.000    0.030    0.000 ops.py:3037(_as_graph_element_locked)
     6019    0.010    0.000    0.010    0.000 {built-in method numpy.core.multiarray.array}
    11832    0.010    0.000    0.020    0.000 compat.py:46(as_bytes)
    29597    0.009    0.000    0.013    0.000 ops.py:473(__hash__)
       17    0.009    0.001    0.009    0.001 ops.py:3343(get_collection)
     5899    0.008    0.000    0.014    0.000 session.py:1038(_feed_fn)
     5933    0.008    0.000    0.038    0.000 ops.py:3002(as_graph_element)
       17    0.007    0.000    0.033    0.002 nest.py:226(flatten_dict_items)
       98    0.007    0.000    0.007    0.000 {built-in method posix.urandom}
     5899    0.006    0.000    0.057    0.000 tensor_shape.py:844(as_shape)
     5899    0.006    0.000    0.011    0.000 abc.py:178(__instancecheck__)
     5899    0.006    0.000    0.015    0.000 session_ops.py:253(_get_handle_feeder)
       34    0.006    0.000    0.025    0.001 session.py:1381(_update_with_movers)
     5916    0.006    0.000    0.027    0.000 session.py:1286(<genexpr>)
     5899    0.006    0.000    0.031    0.000 tensor_shape.py:455(<listcomp>)
     5899    0.006    0.000    0.009    0.000 ops.py:4224(is_feedable)
    11764    0.005    0.000    0.007    0.000 tensor_shape.py:473(ndims)
       17    0.005    0.000    3.378    0.199 backend.py:2462(__call__)
     5899    0.005    0.000    0.021    0.000 nest.py:114(is_sequence)
    11798    0.004    0.000    0.004    0.000 _weakrefset.py:70(__contains__)
     5916    0.004    0.000    0.007    0.000 ops.py:297(graph)
    11832    0.004    0.000    0.004    0.000 {method 'encode' of 'str' objects}
    29597    0.004    0.000    0.004    0.000 {built-in method builtins.id}
     5899    0.003    0.000    0.005    0.000 ops.py:372(get_shape)
    17748    0.003    0.000    0.003    0.000 session.py:708(graph)
       17    0.003    0.000    0.156    0.009 backend.py:345(get_session)
     5933    0.003    0.000    0.003    0.000 ops.py:1942(graph)
     5933    0.003    0.000    0.010    0.000 ops.py:109(_as_graph_element)
     5899    0.003    0.000    0.018    0.000 session_ops.py:273(_get_handle_mover)
     5901    0.002    0.000    0.012    0.000 numeric.py:424(asarray)
        1    0.002    0.002    3.669    3.669 training.py:1039(_fit_loop)
       17    0.002    0.000    2.856    0.168 session.py:1258(_do_run)
     5899    0.002    0.000    0.002    0.000 dtypes.py:123(as_numpy_dtype)
    14292    0.002    0.000    0.002    0.000 {built-in method builtins.len}
    11696    0.002    0.000    0.002    0.000 tensor_shape.py:78(value)
     3530    0.002    0.000    0.002    0.000 {built-in method builtins.hasattr}
    11764    0.002    0.000    0.002    0.000 tensor_shape.py:468(dims)
     5950    0.002    0.000    0.002    0.000 ops.py:287(op)
     5899    0.002    0.000    0.003    0.000 backend.py:441(is_sparse)
       98    0.002    0.000    0.009    0.000 iostream.py:180(schedule)
     5899    0.002    0.000    0.002    0.000 session.py:125(<lambda>)
     5899    0.002    0.000    0.002    0.000 ops.py:314(shape)
     5917    0.001    0.000    0.001    0.000 {built-in method builtins.iter}
        4    0.001    0.000    0.002    0.000 training.py:39(_standardize_input_data)
     5899    0.001    0.000    0.001    0.000 ops.py:292(dtype)
       13    0.001    0.000    0.018    0.001 generic_utils.py:275(update)
     6037    0.001    0.000    0.001    0.000 {method 'get' of 'dict' objects}
       34    0.001    0.000    3.213    0.094 session.py:781(run)
        2    0.001    0.000    0.003    0.001 topology.py:715(<listcomp>)
       26    0.001    0.000    0.001    0.000 {method 'partition' of 'numpy.ndarray' objects}
       34    0.001    0.000    0.003    0.000 session.py:401(__init__)
       13    0.001    0.000    0.001    0.000 callbacks.py:228(on_batch_end)
       40    0.001    0.000    0.001    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.001    0.001    0.310    0.310 training.py:1296(_test_loop)
       26    0.001    0.000    0.005    0.000 function_base.py:4125(_median)
       26    0.001    0.000    0.002    0.000 utils.py:1119(_median_nancheck)
       40    0.000    0.000    0.001    0.000 _methods.py:53(_mean)
       34    0.000    0.000    0.001    0.000 session.py:458(build_results)
       52    0.000    0.000    0.001    0.000 numeric.py:1459(normalize_axis_tuple)
       17    0.000    0.000    2.826    0.166 session.py:1290(_run_fn)
       26    0.000    0.000    0.001    0.000 numeric.py:1515(moveaxis)
       13    0.000    0.000    0.021    0.002 callbacks.py:117(on_batch_end)
       40    0.000    0.000    0.002    0.000 fromnumeric.py:2854(mean)
       57    0.000    0.000    0.008    0.000 iostream.py:342(write)
    68/34    0.000    0.000    0.001    0.000 session.py:215(for_fetch)
       17    0.000    0.000    0.000    0.000 errors_impl.py:467(__exit__)
       34    0.000    0.000    0.000    0.000 session.py:347(build_results)
        1    0.000    0.000    3.674    3.674 training.py:1436(fit)
       26    0.000    0.000    0.006    0.000 function_base.py:3982(_ureduce)
      111    0.000    0.000    0.000    0.000 threading.py:1104(is_alive)
       13    0.000    0.000    0.003    0.000 callbacks.py:97(on_batch_begin)
       51    0.000    0.000    0.000    0.000 contextlib.py:85(__exit__)
       51    0.000    0.000    0.000    0.000 contextlib.py:59(__init__)
       17    0.000    0.000    0.000    0.000 {method 'tolist' of 'numpy.ndarray' objects}
       26    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
       13    0.000    0.000    0.017    0.001 callbacks.py:302(on_batch_end)
      102    0.000    0.000    0.000    0.000 ops.py:4334(get_controller)
       34    0.000    0.000    0.001    0.000 session.py:334(__init__)
       51    0.000    0.000    0.000    0.000 ops.py:2611(version)
       51    0.000    0.000    0.001    0.000 ops.py:3237(as_default)
       34    0.000    0.000    0.000    0.000 session.py:295(_uniquify_fetches)
       53    0.000    0.000    0.000    0.000 ops.py:4500(get_default)
      104    0.000    0.000    0.000    0.000 numerictypes.py:631(issubclass_)
       52    0.000    0.000    0.000    0.000 numerictypes.py:699(issubdtype)
       17    0.000    0.000    0.257    0.015 training.py:373(_slice_arrays)
        1    0.000    0.000    0.000    0.000 {method 'shuffle' of 'mtrand.RandomState' objects}
       13    0.000    0.000    0.011    0.001 threading.py:263(wait)
      236    0.000    0.000    0.000    0.000 {built-in method builtins.issubclass}
       13    0.000    0.000    0.000    0.000 threading.py:215(__init__)
       51    0.000    0.000    0.000    0.000 contextlib.py:157(helper)
      373    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
       40    0.000    0.000    0.000    0.000 _methods.py:43(_count_reduce_items)
       34    0.000    0.000    0.001    0.000 session.py:251(__init__)
       13    0.000    0.000    0.011    0.001 threading.py:533(wait)
       13    0.000    0.000    0.012    0.001 iostream.py:311(flush)
       26    0.000    0.000    0.006    0.000 function_base.py:4037(median)
       26    0.000    0.000    0.001    0.000 fromnumeric.py:578(partition)
      111    0.000    0.000    0.000    0.000 threading.py:1062(_wait_for_tstate_lock)
       98    0.000    0.000    0.000    0.000 iostream.py:87(_event_pipe)
        6    0.000    0.000    0.000    0.000 training.py:221(<listcomp>)
       70    0.000    0.000    0.000    0.000 ops.py:4317(get_default)
       17    0.000    0.000    2.826    0.166 session.py:1321(_do_call)
       17    0.000    0.000    0.000    0.000 session.py:1338(_extend_graph)
      102    0.000    0.000    0.000    0.000 {built-in method builtins.next}
       17    0.000    0.000    0.000    0.000 {built-in method _pywrap_tensorflow_internal.TF_GetCode}
       17    0.000    0.000    0.000    0.000 c_api_util.py:29(__init__)
       14    0.000    0.000    0.001    0.000 {method 'mean' of 'numpy.generic' objects}
      118    0.000    0.000    0.001    0.000 numeric.py:495(asanyarray)
       26    0.000    0.000    0.000    0.000 {built-in method builtins.sorted}
       57    0.000    0.000    0.000    0.000 iostream.py:284(_is_master_process)
       17    0.000    0.000    0.009    0.001 variables.py:1225(global_variables)
       53    0.000    0.000    0.000    0.000 ops.py:4542(get_default_graph)
       92    0.000    0.000    0.000    0.000 {built-in method time.time}
       17    0.000    0.000    0.009    0.001 ops.py:4831(get_collection)
       17    0.000    0.000    0.000    0.000 {built-in method _pywrap_tensorflow_internal.TF_NewStatus}
       34    0.000    0.000    0.000    0.000 session.py:516(<listcomp>)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.copyto}
       13    0.000    0.000    0.000    0.000 threading.py:498(__init__)
       57    0.000    0.000    0.000    0.000 {built-in method posix.getpid}
       17    0.000    0.000    0.000    0.000 errors_impl.py:463(__enter__)
       34    0.000    0.000    0.000    0.000 ops.py:1925(type)
      104    0.000    0.000    0.000    0.000 numeric.py:1506(<genexpr>)
       17    0.000    0.000    0.000    0.000 ops.py:4401(get_default_session)
       26    0.000    0.000    0.000    0.000 {method 'transpose' of 'numpy.ndarray' objects}
       51    0.000    0.000    0.000    0.000 contextlib.py:79(__enter__)
       51    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
       57    0.000    0.000    0.005    0.000 iostream.py:297(_schedule_flush)
       34    0.000    0.000    0.001    0.000 session.py:341(<listcomp>)
       17    0.000    0.000    0.000    0.000 ops.py:5058(_assert_collection_is_ok)
       17    0.000    0.000    0.000    0.000 context.py:327(in_eager_mode)
       17    0.000    0.000    0.000    0.000 c_api_util.py:32(__del__)
        6    0.000    0.000    0.000    0.000 training.py:215(set_of_lengths)
       26    0.000    0.000    0.000    0.000 core.py:6190(isMaskedArray)
        2    0.000    0.000    0.005    0.002 training.py:1368(_standardize_user_data)
        1    0.000    0.000    3.674    3.674 {built-in method builtins.exec}
       34    0.000    0.000    0.000    0.000 session.py:287(build_results)
       34    0.000    0.000    0.000    0.000 session.py:507(_name_list)
       13    0.000    0.000    0.000    0.000 callbacks.py:298(on_batch_begin)
       34    0.000    0.000    0.000    0.000 ops.py:4232(is_fetchable)
       52    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.normalize_axis_index}
        1    0.000    0.000    3.674    3.674 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 training.py:358(_make_batches)
       17    0.000    0.000    0.000    0.000 context.py:148(in_eager_mode)
       17    0.000    0.000    0.000    0.000 six.py:586(iteritems)
       34    0.000    0.000    0.000    0.000 session.py:437(_assert_fetchable)
       17    0.000    0.000    0.000    0.000 {built-in method _pywrap_tensorflow_internal.TF_DeleteStatus}
       99    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.empty}
       34    0.000    0.000    0.000    0.000 session.py:344(unique_fetches)
       34    0.000    0.000    0.000    0.000 session.py:124(<lambda>)
       39    0.000    0.000    0.000    0.000 {method 'append' of 'collections.deque' objects}
      111    0.000    0.000    0.000    0.000 threading.py:506(is_set)
       26    0.000    0.000    0.000    0.000 {method 'insert' of 'list' objects}
       34    0.000    0.000    0.000    0.000 session.py:435(<listcomp>)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.arange}
       14    0.000    0.000    0.000    0.000 {built-in method builtins.max}
       26    0.000    0.000    0.000    0.000 numeric.py:1577(<listcomp>)
        2    0.000    0.000    0.000    0.000 training.py:250(_check_loss_and_target_compatibility)
       34    0.000    0.000    0.000    0.000 session.py:351(<listcomp>)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.any}
        2    0.000    0.000    0.000    0.000 training.py:369(<listcomp>)
       13    0.000    0.000    0.000    0.000 threading.py:239(__enter__)
       13    0.000    0.000    0.000    0.000 threading.py:242(__exit__)
       26    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}
        2    0.000    0.000    0.000    0.000 backend.py:139(floatx)
       34    0.000    0.000    0.000    0.000 session.py:442(fetches)
       13    0.000    0.000    0.000    0.000 threading.py:251(_acquire_restore)
       52    0.000    0.000    0.000    0.000 {built-in method _operator.index}
        2    0.000    0.000    0.003    0.001 topology.py:713(stateful)
       34    0.000    0.000    0.000    0.000 session.py:450(targets)
       17    0.000    0.000    0.000    0.000 backend.py:554(<listcomp>)
       17    0.000    0.000    0.000    0.000 context.py:307(context)
       13    0.000    0.000    0.000    0.000 threading.py:254(_is_owned)
       14    0.000    0.000    0.000    0.000 {built-in method builtins.abs}
        2    0.000    0.000    0.005    0.002 {built-in method builtins.print}
        2    0.000    0.000    0.000    0.000 training.py:203(_check_array_lengths)
        2    0.000    0.000    0.000    0.000 training.py:467(_standardize_weights)
        1    0.000    0.000    0.001    0.001 callbacks.py:86(on_epoch_end)
        1    0.000    0.000    0.000    0.000 callbacks.py:287(on_epoch_begin)
        2    0.000    0.000    0.000    0.000 shape_base.py:255(expand_dims)
       13    0.000    0.000    0.000    0.000 threading.py:248(_release_save)
       26    0.000    0.000    0.000    0.000 callbacks.py:205(on_batch_begin)
       13    0.000    0.000    0.000    0.000 callbacks.py:208(on_batch_end)
       34    0.000    0.000    0.000    0.000 session.py:284(unique_fetches)
        2    0.000    0.000    0.000    0.000 {method 'reshape' of 'numpy.ndarray' objects}
       13    0.000    0.000    0.000    0.000 {method 'release' of '_thread.lock' objects}
       13    0.000    0.000    0.000    0.000 {method '__enter__' of '_thread.lock' objects}
        1    0.000    0.000    0.000    0.000 callbacks.py:340(on_epoch_end)
        2    0.000    0.000    0.000    0.000 numeric.py:146(ones)
       17    0.000    0.000    0.000    0.000 {built-in method builtins.min}
        1    0.000    0.000    0.000    0.000 training.py:1422(_get_deduped_metrics_names)
        1    0.000    0.000    0.000    0.000 callbacks.py:72(on_epoch_begin)
        1    0.000    0.000    0.001    0.001 callbacks.py:319(on_epoch_end)
        1    0.000    0.000    0.000    0.000 generic_utils.py:261(__init__)
        1    0.000    0.000    0.000    0.000 callbacks.py:149(on_train_end)
        2    0.000    0.000    0.000    0.000 backend.py:313(learning_phase)
       13    0.000    0.000    0.000    0.000 {method '__exit__' of '_thread.lock' objects}
        2    0.000    0.000    0.000    0.000 topology.py:708(uses_learning_phase)
        4    0.000    0.000    0.000    0.000 training.py:149(_standardize_sample_or_class_weights)
        2    0.000    0.000    0.000    0.000 training.py:1406(<listcomp>)
        3    0.000    0.000    0.000    0.000 callbacks.py:190(__init__)
        1    0.000    0.000    0.000    0.000 callbacks.py:239(on_epoch_end)
        2    0.000    0.000    0.000    0.000 training.py:198(_standardize_sample_weights)
        1    0.000    0.000    0.000    0.000 copy.py:66(copy)
        2    0.000    0.000    0.000    0.000 training.py:996(_check_num_samples)
        1    0.000    0.000    0.000    0.000 callbacks.py:56(__init__)
        1    0.000    0.000    0.000    0.000 callbacks.py:68(set_model)
        1    0.000    0.000    0.000    0.000 callbacks.py:139(on_train_begin)
        1    0.000    0.000    0.000    0.000 callbacks.py:274(__init__)
        2    0.000    0.000    0.000    0.000 training.py:193(_standardize_class_weights)
        2    0.000    0.000    0.000    0.000 ops.py:4507(_GetGlobalDefaultGraph)
        2    0.000    0.000    0.000    0.000 _internal.py:243(__init__)
        1    0.000    0.000    0.000    0.000 training.py:961(_make_test_function)
        1    0.000    0.000    0.000    0.000 callbacks.py:64(set_params)
        1    0.000    0.000    0.000    0.000 callbacks.py:159(__iter__)
        3    0.000    0.000    0.000    0.000 callbacks.py:193(set_params)
        3    0.000    0.000    0.000    0.000 callbacks.py:196(set_model)
        1    0.000    0.000    0.000    0.000 callbacks.py:224(on_epoch_begin)
        2    0.000    0.000    0.000    0.000 {method 'setdefault' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 training.py:939(_make_train_function)
        1    0.000    0.000    0.000    0.000 training.py:1600(<listcomp>)
        1    0.000    0.000    0.000    0.000 callbacks.py:199(on_epoch_begin)
        3    0.000    0.000    0.000    0.000 callbacks.py:214(on_train_end)
        1    0.000    0.000    0.000    0.000 callbacks.py:283(on_train_begin)
        1    0.000    0.000    0.000    0.000 callbacks.py:336(on_train_begin)
        2    0.000    0.000    0.000    0.000 topology.py:711(<listcomp>)
        4    0.000    0.000    0.000    0.000 training.py:165(<listcomp>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'count' of 'list' objects}
        1    0.000    0.000    0.000    0.000 callbacks.py:58(<listcomp>)
        1    0.000    0.000    0.000    0.000 callbacks.py:211(on_train_begin)
        2    0.000    0.000    0.000    0.000 _internal.py:268(get_data)

With ModelCheckpoint it took 867 seconds.

 30983939 function calls (30862317 primitive calls) in 867.131 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      706  814.923    1.154  814.923    1.154 {built-in method _pywrap_tensorflow_internal.TF_Run}
        2    8.509    4.254    8.509    4.254 {built-in method _pywrap_tensorflow_internal.TF_ExtendGraph}
  9117966    4.909    0.000    4.914    0.000 {built-in method builtins.getattr}
    36964    4.885    0.000    5.354    0.000 ops.py:2530(_extract_stack)
27975/27294    2.117    0.000   24.272    0.001 op_def_library.py:339(_apply_op_helper)
      704    1.400    0.002   24.591    0.035 backend.py:544(_initialize_variables)
    36964    1.130    0.000   19.871    0.001 ops.py:2853(create_op)
    33844    1.124    0.000    1.124    0.000 {built-in method _pywrap_tensorflow_internal.RunCppShapeInference}
   136159    1.046    0.000    1.046    0.000 {method 'SerializeToString' of 'google.protobuf.pyext._message.CMessage' objects}
    79850    0.872    0.000    1.172    0.000 ops.py:1989(get_attr)
    33844    0.751    0.000    5.745    0.000 common_shapes.py:650(_call_cpp_shape_fn_impl)
    36964    0.713    0.000    8.672    0.000 ops.py:1367(__init__)
  2383601    0.670    0.000    0.722    0.000 {built-in method builtins.isinstance}
    36964    0.610    0.000    0.944    0.000 ops.py:1312(_NodeDef)
    63144    0.576    0.000    0.576    0.000 {method 'extend' of 'google.protobuf.pyext._message.RepeatedCompositeContainer' objects}
   137072    0.558    0.000    0.558    0.000 {method 'ByteSize' of 'google.protobuf.pyext._message.CMessage' objects}
   112690    0.483    0.000    0.787    0.000 tensor_shape.py:812(as_proto)
    36964    0.465    0.000    8.011    0.000 ops.py:2199(set_shapes_for_outputs)
   126500    0.404    0.000    0.771    0.000 functools.py:44(update_wrapper)
    66719    0.382    0.000    1.334    0.000 common_shapes.py:656(tensor_to_inference_result)
        2    0.368    0.184    1.285    0.643 ops.py:2691(_as_graph_def)
      704    0.358    0.001    0.365    0.001 ops.py:3343(get_collection)
    74789    0.358    0.000    0.996    0.000 ops.py:4575(_get_graph_from_inputs)
70374/63572    0.349    0.000    5.934    0.000 ops.py:843(internal_convert_to_tensor)
     2530    0.334    0.000    1.573    0.001 tf_should_use.py:31(_add_should_use_warning)
   164835    0.327    0.000    0.327    0.000 {method 'CopyFrom' of 'google.protobuf.pyext._message.CMessage' objects}
    36964    0.322    0.000    6.429    0.000 common_shapes.py:588(call_cpp_shape_fn)
   404444    0.319    0.000    0.429    0.000 dtypes.py:551(as_dtype)
    36964    0.309    0.000    0.494    0.000 ops.py:2579(_add_op)
     8989    0.294    0.000    1.142    0.000 tensor_util.py:306(make_tensor_proto)
  1817455    0.289    0.000    0.289    0.000 {method 'append' of 'list' objects}
    17872    0.275    0.000    0.275    0.000 {method 'reduce' of 'numpy.ufunc' objects}
   174869    0.272    0.000    0.343    0.000 contextlib.py:59(__init__)
    76036    0.268    0.000    0.335    0.000 registry.py:78(lookup)
       17    0.268    0.016    0.268    0.016 training.py:400(<listcomp>)
    77414    0.265    0.000    0.304    0.000 tensor_shape.py:818(<listcomp>)
   142448    0.264    0.000    0.804    0.000 tensor_shape.py:420(__init__)
   233196    0.236    0.000    0.236    0.000 ops.py:1623(name)
    36964    0.223    0.000    0.223    0.000 {method '__deepcopy__' of 'google.protobuf.pyext._message.CMessage' objects}
349742/318070    0.219    0.000    1.190    0.000 {built-in method builtins.next}
  1505559    0.210    0.000    0.210    0.000 ops.py:2566(_extract_frame_info)
   588583    0.209    0.000    0.209    0.000 {method 'HasField' of 'google.protobuf.pyext._message.CMessage' objects}
     8989    0.209    0.000    4.059    0.000 constant_op.py:129(constant)
    89275    0.206    0.000    0.229    0.000 tensor_shape.py:27(__init__)
   759000    0.205    0.000    0.205    0.000 {built-in method builtins.setattr}
   139871    0.204    0.000    0.352    0.000 ops.py:4500(get_default)
174869/159033    0.198    0.000    0.568    0.000 contextlib.py:85(__exit__)
   108386    0.195    0.000    0.388    0.000 dtypes.py:235(is_compatible_with)
    36964    0.192    0.000    0.596    0.000 copy.py:132(deepcopy)
    36964    0.191    0.000    0.482    0.000 ops.py:1824(_recompute_node_def)
   153124    0.182    0.000    0.225    0.000 ops.py:4334(get_controller)
    39084    0.179    0.000    0.218    0.000 ops.py:3535(unique_name)
    46474    0.177    0.000    1.515    0.000 ops.py:4899(__enter__)
    38011    0.175    0.000    0.380    0.000 ops.py:260(__init__)
    83438    0.175    0.000    0.175    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
   199415    0.170    0.000    0.170    0.000 ops.py:1925(type)
    92948    0.170    0.000    0.451    0.000 ops.py:3423(name_scope)
    16566    0.164    0.000    0.164    0.000 {built-in method builtins.sorted}
    33844    0.164    0.000    0.295    0.000 common_shapes.py:703(<listcomp>)
   174869    0.158    0.000    0.501    0.000 contextlib.py:157(helper)
     1719    0.148    0.000    0.332    0.000 attrs.py:102(create)
    45948    0.144    0.000    0.144    0.000 {built-in method FromString}
   233948    0.144    0.000    0.207    0.000 dtypes.py:96(base_dtype)
      704    0.136    0.000   24.750    0.035 backend.py:345(get_session)
      706    0.135    0.000  825.310    1.169 session.py:1290(_run_fn)
   102808    0.134    0.000    0.184    0.000 ops.py:1869(inputs)
   140771    0.130    0.000    0.237    0.000 dtypes.py:258(__eq__)
    65968    0.128    0.000    0.230    0.000 op_def_library.py:52(_SatisfiesTypeConstraint)
   244058    0.127    0.000    0.137    0.000 {method 'get' of 'dict' objects}
    36964    0.127    0.000    0.138    0.000 ops.py:3226(_next_id)
    34552    0.125    0.000    0.231    0.000 errors_impl.py:467(__exit__)
   140575    0.117    0.000    0.150    0.000 ops.py:4317(get_default)
   139871    0.116    0.000    0.469    0.000 ops.py:4542(get_default_graph)
    37297    0.110    0.000    0.360    0.000 tensor_shape.py:442(<listcomp>)
   129842    0.110    0.000    0.349    0.000 tensor_shape.py:381(as_dimension)
   111472    0.108    0.000    0.208    0.000 compat.py:46(as_bytes)
    16921    0.108    0.000    0.189    0.000 ops.py:1584(colocation_groups)
    36964    0.106    0.000    0.163    0.000 ops.py:3889(_control_dependencies_for_inputs)
    46474    0.106    0.000    0.154    0.000 ops.py:4885(__init__)
    39271    0.105    0.000    0.105    0.000 {built-in method numpy.core.multiarray.array}
        1    0.104    0.104  809.039  809.039 topology.py:1256(save_weights_to_hdf5_group)
   132991    0.104    0.000    0.144    0.000 ops.py:297(graph)
        1    0.102    0.102   11.570   11.570 gradients_impl.py:375(gradients)
     7177    0.100    0.000    0.414    0.000 tensor_util.py:488(MakeNdarray)
    34550    0.093    0.000    0.206    0.000 op_def_library.py:172(_MakeType)
    33844    0.092    0.000    1.427    0.000 common_shapes.py:664(<listcomp>)
    76562    0.091    0.000    0.348    0.000 ops.py:3237(as_default)
    36964    0.091    0.000    0.106    0.000 copy.py:252(_keep_alive)
174869/159033    0.090    0.000    0.880    0.000 contextlib.py:79(__enter__)
    45683    0.087    0.000    0.783    0.000 tensor_shape.py:555(merge_with)
     2530    0.085    0.000    0.085    0.000 {built-in method builtins.dir}
   119690    0.084    0.000    0.084    0.000 {method 'encode' of 'str' objects}
    74132    0.083    0.000    0.719    0.000 tensor_shape.py:844(as_shape)
    33844    0.083    0.000    0.083    0.000 common_shapes.py:706(<listcomp>)
12367/10660    0.083    0.000    0.687    0.000 tensor_util.py:606(_ConstantValue)
    77842    0.081    0.000    0.282    0.000 ops.py:1446(<genexpr>)
     1716    0.081    0.000    0.117    0.000 group.py:42(create_group)
    46474    0.080    0.000    0.364    0.000 ops.py:4934(__exit__)
     1409    0.078    0.000  826.077    0.586 session.py:1036(_run)
    49867    0.076    0.000    0.168    0.000 ops.py:435(_as_node_def_input)
   381934    0.075    0.000    0.075    0.000 dtypes.py:128(as_datatype_enum)
310845/307009    0.075    0.000    0.077    0.000 {built-in method builtins.len}
    34552    0.072    0.000    0.072    0.000 {built-in method _pywrap_tensorflow_internal.TF_NewStatus}
   125491    0.072    0.000    0.072    0.000 context.py:148(in_eager_mode)
   250819    0.069    0.000    0.069    0.000 dtypes.py:83(_is_ref_dtype)
     2530    0.068    0.000    0.073    0.000 {built-in method builtins.__build_class__}
   126500    0.067    0.000    0.067    0.000 tf_should_use.py:47(override_method)
    36964    0.066    0.000    0.446    0.000 ops.py:1439(<listcomp>)
     6975    0.066    0.000    6.996    0.001 math_ops.py:881(binary_op_wrapper)
    36970    0.066    0.000    0.066    0.000 {method 'extend' of 'google.protobuf.pyext._message.RepeatedScalarContainer' objects}
    43044    0.065    0.000    0.365    0.000 {built-in method builtins.all}
67952/55950    0.064    0.000    0.228    0.000 op_def_library.py:248(_MaybeColocateWith)
    40491    0.063    0.000    0.706    0.000 ops.py:376(set_shape)
      694    0.063    0.000    0.090    0.000 dataset.py:53(make_new_dset)
    10633    0.063    0.000    0.252    0.000 fromnumeric.py:2456(prod)
   226468    0.062    0.000    0.062    0.000 ops.py:1839(outputs)
      694    0.062    0.000    0.205    0.000 dataset.py:508(__setitem__)
    75635    0.061    0.000    0.061    0.000 {method 'WhichOneof' of 'google.protobuf.pyext._message.CMessage' objects}
    33844    0.060    0.000    0.060    0.000 common_shapes.py:708(<listcomp>)
    90752    0.059    0.000    0.084    0.000 ops.py:372(get_shape)
    45629    0.059    0.000    0.093    0.000 six.py:586(iteritems)
    42157    0.058    0.000    0.103    0.000 tensor_shape.py:852(unknown_shape)
   199127    0.056    0.000    0.062    0.000 ops.py:287(op)
   179595    0.056    0.000    0.056    0.000 context.py:307(context)
    34552    0.056    0.000    0.175    0.000 errors_impl.py:463(__enter__)
   201176    0.055    0.000    0.055    0.000 ops.py:292(dtype)
   126500    0.055    0.000    0.055    0.000 functools.py:74(wraps)
    36964    0.055    0.000    6.484    0.000 ops.py:2158(call_with_requiring)
   179102    0.052    0.000    0.052    0.000 ops.py:1947(node_def)
   237085    0.052    0.000    0.052    0.000 tensor_shape.py:78(value)
     3470    0.052    0.000    2.790    0.001 gen_state_ops.py:23(assign)
   126502    0.051    0.000    0.051    0.000 {method 'update' of 'dict' objects}
    27975    0.051    0.000    0.116    0.000 op_def_library.py:86(<listcomp>)
    36964    0.051    0.000    0.052    0.000 ops.py:3743(_apply_device_functions)
   102808    0.050    0.000    0.050    0.000 ops.py:1849(__init__)
    27975    0.050    0.000    0.186    0.000 op_def_library.py:83(_Flatten)
    22393    0.049    0.000    0.103    0.000 ops.py:302(name)
     1388    0.049    0.000    3.252    0.002 variables.py:220(_init_from_args)
    55766    0.049    0.000    0.159    0.000 ops.py:769(_TensorTensorConversionFunction)
    36964    0.049    0.000    0.217    0.000 ops.py:1827(<listcomp>)
    76036    0.048    0.000    0.067    0.000 compat.py:68(as_text)
   174442    0.047    0.000    0.047    0.000 ops.py:1942(graph)
    34552    0.047    0.000    0.047    0.000 {built-in method _pywrap_tensorflow_internal.TF_GetCode}
    34552    0.047    0.000    0.119    0.000 c_api_util.py:29(__init__)
    48486    0.047    0.000    0.047    0.000 op_def_library.py:63(_IsListParameter)
   181448    0.046    0.000    0.046    0.000 ops.py:1634(_id)
    38011    0.046    0.000    0.046    0.000 {built-in method _pywrap_tensorflow_internal.TFE_Py_UID}
    49867    0.046    0.000    0.064    0.000 ops.py:422(_add_consumer)
    23828    0.046    0.000    0.074    0.000 ops.py:3597(colocate_with)
   129129    0.045    0.000    0.045    0.000 {built-in method builtins.iter}
53130/35420    0.044    0.000    0.062    0.000 tf_should_use.py:61(__getattribute__)
    32115    0.044    0.000    0.044    0.000 op_def_library.py:37(_Attr)
    44354    0.044    0.000    0.044    0.000 ops.py:2368(_name_from_scope_name)
    48486    0.043    0.000    0.093    0.000 op_def_library.py:550(<listcomp>)
    70636    0.042    0.000    0.066    0.000 op_def_library.py:79(_IsListValue)
    36980    0.041    0.000    0.041    0.000 {built-in method builtins.max}
     4132    0.041    0.000    0.041    0.000 {built-in method h5py.h5t.py_create}
    36964    0.038    0.000    0.053    0.000 ops.py:3907(<listcomp>)
    91255    0.038    0.000    0.038    0.000 {method 'pop' of 'list' objects}
    27975    0.038    0.000    0.038    0.000 op_def_library.py:781(<listcomp>)
   113668    0.038    0.000    0.038    0.000 ops.py:2570(_check_not_finalized)
      694    0.037    0.000    0.049    0.000 group.py:248(__setitem__)
     3488    0.036    0.000    2.196    0.001 gen_math_ops.py:2709(_mul)
2770/1750    0.036    0.000    0.363    0.000 tensor_util.py:751(constant_value_as_shape)
   170431    0.036    0.000    0.036    0.000 {built-in method builtins.id}
12367/10660    0.036    0.000    0.751    0.000 tensor_util.py:714(constant_value)
    55485    0.036    0.000    0.036    0.000 context.py:144(in_graph_mode)
    51260    0.035    0.000    0.052    0.000 ops.py:1852(__iter__)
    37655    0.034    0.000    0.089    0.000 ops.py:4560(_assert_same_graph)
        1    0.033    0.033   29.816   29.816 optimizers.py:443(get_updates)
    34552    0.033    0.000    0.059    0.000 c_api_util.py:32(__del__)
    38011    0.032    0.000    0.078    0.000 ops.py:172(uid)
        1    0.032    0.032    0.056    0.056 gradients_impl.py:106(_MarkReachedOps)
     2776    0.032    0.000    0.033    0.000 dataset.py:238(dtype)
     3833    0.032    0.000    0.064    0.000 gradients_impl.py:634(_UpdatePendingAndEnqueueReady)
    49143    0.031    0.000    0.099    0.000 ops.py:109(_as_graph_element)
    36964    0.031    0.000    0.031    0.000 {built-in method sys.exc_info}
     1409    0.031    0.000    0.239    0.000 session.py:401(__init__)
     3069    0.031    0.000    0.070    0.000 numeric.py:2358(array_equal)
     4871    0.029    0.000    0.029    0.000 {tensorflow.python.framework.fast_tensor_util.AppendFloat32ArrayToTensorProto}
     2530    0.028    0.000    1.751    0.001 gen_state_ops.py:223(is_variable_initialized)
    96525    0.028    0.000    0.028    0.000 {method 'extend' of 'list' objects}
    11914    0.028    0.000    0.129    0.000 ops.py:4277(colocate_with)
      694    0.028    0.000    0.037    0.000 dataset.py:317(__init__)
     5899    0.027    0.000    0.114    0.000 tensor_shape.py:693(is_compatible_with)
    53565    0.027    0.000    0.027    0.000 {method 'startswith' of 'str' objects}
        1    0.027    0.027    0.120    0.120 gradients_impl.py:149(_PendingCount)
    13777    0.026    0.000    0.084    0.000 ops.py:3037(_as_graph_element_locked)
    31889    0.026    0.000    0.104    0.000 tensor_shape.py:455(<listcomp>)
    69939    0.026    0.000    0.026    0.000 {method 'pop' of 'dict' objects}
     1409    0.026    0.000    0.060    0.000 session.py:458(build_results)
    93534    0.025    0.000    0.025    0.000 ops.py:314(shape)
    34552    0.025    0.000    0.025    0.000 {built-in method _pywrap_tensorflow_internal.TF_DeleteStatus}
     1388    0.025    0.000    0.985    0.001 gen_state_ops.py:893(_variable_v2)
     1719    0.025    0.000    0.366    0.000 attrs.py:87(__setitem__)
     2410    0.025    0.000    0.026    0.000 base.py:119(get_lcpl)
    23917    0.025    0.000    0.042    0.000 tensor_shape.py:83(is_compatible_with)
     8983    0.025    0.000    0.099    0.000 tensor_util.py:295(_AssertCompatible)
      166    0.024    0.000    0.024    0.000 {method 'acquire' of '_thread.lock' objects}
    84741    0.024    0.000    0.024    0.000 {method 'items' of 'dict' objects}
     2084    0.024    0.000    1.318    0.001 gen_math_ops.py:4619(_sub)
    31734    0.023    0.000    0.081    0.000 dtypes.py:268(__ne__)
     3833    0.023    0.000    3.106    0.001 gradients_impl.py:791(_AggregatedGrads)
     1020    0.023    0.000    3.385    0.003 array_ops.py:411(_SliceHelper)
    36964    0.023    0.000    0.029    0.000 ops.py:3924(_record_op_seen_by_control_dependencies)
19813/18793    0.023    0.000    4.141    0.000 ops.py:786(convert_to_tensor)
     8648    0.022    0.000    0.041    0.000 tensor_util.py:137(GetFromNumpyDTypeDict)
    18521    0.022    0.000    0.041    0.000 context.py:322(in_graph_mode)
     2086    0.022    0.000    1.321    0.001 gen_math_ops.py:166(add)
    57140    0.022    0.000    0.031    0.000 ops.py:473(__hash__)
    12170    0.021    0.000    0.077    0.000 tensor_shape.py:113(merge_with)
    15986    0.021    0.000    0.040    0.000 abc.py:178(__instancecheck__)
     1731    0.021    0.000    2.033    0.001 gen_array_ops.py:3866(reshape)
    43878    0.021    0.000    0.028    0.000 tensor_shape.py:473(ndims)
     3776    0.021    0.000    0.021    0.000 {tensorflow.python.framework.fast_tensor_util.AppendInt32ArrayToTensorProto}
      694    0.021    0.000    0.196    0.000 group.py:53(create_dataset)
    22227    0.021    0.000    0.021    0.000 ops.py:1639(device)
     1020    0.020    0.000    1.792    0.002 gen_array_ops.py:5270(strided_slice)
     6145    0.020    0.000    0.020    0.000 {method 'repeat' of 'numpy.ndarray' objects}
    74789    0.020    0.000    0.020    0.000 ops.py:2847(building_function)
     7230    0.019    0.000    3.179    0.000 constant_op.py:226(_constant_tensor_conversion_function)
    27975    0.019    0.000    0.019    0.000 op_def_library.py:88(<listcomp>)
     1388    0.019    0.000    0.019    0.000 dataset.py:221(shape)
      694    0.019    0.000    0.020    0.000 filters.py:73(generate_dcpl)
    21886    0.019    0.000    0.019    0.000 _weakrefset.py:70(__contains__)
     3161    0.018    0.000    0.026    0.000 gradients_impl.py:674(_SetGrad)
     5482    0.018    0.000    0.018    0.000 {method 'astype' of 'numpy.ndarray' objects}
    13777    0.017    0.000    0.101    0.000 ops.py:3002(as_graph_element)
    16697    0.017    0.000    0.045    0.000 {built-in method builtins.hasattr}
     2107    0.017    0.000    0.085    0.000 gradients_impl.py:729(_LogOpGradients)
     1716    0.016    0.000    0.032    0.000 base.py:262(attrs)
     1409    0.016    0.000    0.999    0.001 array_ops.py:274(shape_internal)
     7327    0.016    0.000    0.038    0.000 tensor_shape.py:607(assert_same_rank)
     1820    0.016    0.000    0.016    0.000 {built-in method posix.urandom}
     9289    0.015    0.000    0.049    0.000 base.py:109(_e)
    17690    0.015    0.000    0.022    0.000 {method 'add' of 'set' objects}
4667/1409    0.015    0.000    0.130    0.000 session.py:215(for_fetch)
     1719    0.014    0.000    0.020    0.000 uuid.py:106(__init__)
     1388    0.014    0.000    0.855    0.001 gen_array_ops.py:2058(identity)
     3471    0.014    0.000    3.088    0.001 variables.py:752(_run_op)
12386/8983    0.014    0.000    0.018    0.000 tensor_util.py:170(_GetDenseDimensions)
    39350    0.014    0.000    0.014    0.000 {built-in method builtins.issubclass}
     8648    0.014    0.000    0.054    0.000 tensor_util.py:145(GetNumpyAppendFn)
    33844    0.014    0.000    0.014    0.000 common_shapes.py:674(<listcomp>)
     1391    0.014    0.000    1.565    0.001 math_ops.py:907(r_binary_op_wrapper)
    27975    0.014    0.000    0.014    0.000 {method 'values' of 'dict' objects}
    24025    0.013    0.000    0.054    0.000 numeric.py:424(asarray)
     1057    0.013    0.000    0.720    0.001 gen_array_ops.py:4479(shape)
     1408    0.013    0.000    0.018    0.000 session.py:347(build_results)
2776/1388    0.013    0.000    0.021    0.000 variables.py:763(_build_initializer_expr)
    33844    0.013    0.000    0.013    0.000 common_shapes.py:666(<listcomp>)
     8648    0.013    0.000    0.013    0.000 {method 'ravel' of 'numpy.ndarray' objects}
     2108    0.012    0.000    0.089    0.000 ops.py:2126(get_gradient_function)
     3470    0.012    0.000    2.808    0.001 state_ops.py:248(assign)
    25804    0.012    0.000    0.012    0.000 op_def_library.py:45(_AttrValue)
     7177    0.012    0.000    0.012    0.000 tensor_util.py:503(<listcomp>)
    33844    0.012    0.000    0.012    0.000 ops.py:2627(graph_def_versions)
     6665    0.012    0.000    0.031    0.000 tf_logging.py:120(vlog)
    38358    0.012    0.000    0.012    0.000 ops.py:2675(_get_control_flow_context)
     4132    0.012    0.000    0.080    0.000 fromnumeric.py:1885(product)
      695    0.012    0.000    0.998    0.001 clip_ops.py:33(clip_by_value)
     7179    0.011    0.000    0.011    0.000 {method 'reshape' of 'numpy.ndarray' objects}
      706    0.011    0.000  825.390    1.169 session.py:1258(_do_run)
     7668    0.011    0.000    0.075    0.000 gradients_impl.py:314(_maybe_colocate_with)
7516/4456    0.011    0.000    0.018    0.000 tensor_util.py:268(_FilterNotTensor)
    36964    0.011    0.000    0.011    0.000 ops.py:2605(_c_graph)
     1388    0.011    0.000    0.032    0.000 selections.py:272(broadcast)
    13542    0.011    0.000    0.016    0.000 tensor_shape.py:501(__getitem__)
     7389    0.011    0.000    0.014    0.000 variables.py:882(dtype)
     4187    0.010    0.000    0.024    0.000 tensor_shape.py:785(is_fully_defined)
     1388    0.010    0.000    3.285    0.002 backend.py:493(variable)
      680    0.010    0.000    2.988    0.004 gradients_impl.py:59(_IndexedSlicesToTensor)
     8983    0.010    0.000    0.062    0.000 tensor_util.py:297(<listcomp>)
      694    0.010    0.000    0.600    0.001 math_ops.py:1729(matmul)
      340    0.010    0.000    3.957    0.012 array_grad.py:355(_GatherGrad)
     6698    0.010    0.000    0.020    0.000 variables.py:719(_TensorConversionFunction)
      708    0.010    0.000    0.673    0.001 gen_math_ops.py:4654(_sum)
      694    0.010    0.000    0.532    0.001 gen_math_ops.py:2404(_mat_mul)
     1408    0.010    0.000    0.024    0.000 session.py:295(_uniquify_fetches)
     1396    0.010    0.000    0.036    0.000 ops.py:3933(control_dependencies)
     1409    0.010    0.000  826.087    0.586 session.py:781(run)
     2107    0.009    0.000    7.578    0.004 gradients_impl.py:338(_MaybeCompile)
    37658    0.009    0.000    0.009    0.000 {method 'reverse' of 'list' objects}
    19347    0.009    0.000    0.009    0.000 dtypes.py:123(as_numpy_dtype)
     1388    0.009    0.000    4.016    0.003 backend.py:821(zeros)
     1021    0.009    0.000    0.009    0.000 {built-in method numpy.core.multiarray.fromstring}
    17631    0.009    0.000    0.009    0.000 ops.py:1864(__getitem__)
        1    0.009    0.009    0.009    0.009 files.py:278(close)
     2776    0.009    0.000    0.018    0.000 ops.py:3277(add_to_collection)
      694    0.009    0.000    0.029    0.000 selections.py:250(__getitem__)
     1409    0.009    0.000    0.029    0.000 session.py:1381(_update_with_movers)
      698    0.009    0.000    0.641    0.001 math_ops.py:973(_truediv_python3)
     3488    0.008    0.000    2.205    0.001 math_ops.py:1113(_mul_dispatch)
     7667    0.008    0.000    0.023    0.000 control_flow_ops.py:1319(IsLoopExit)
    10633    0.008    0.000    0.189    0.000 _methods.py:34(_prod)
     3259    0.008    0.000    0.062    0.000 session.py:251(__init__)
      695    0.008    0.000    0.444    0.001 gen_math_ops.py:2636(minimum)
     2530    0.008    0.000    1.767    0.001 state_ops.py:171(is_variable_initialized)
     6145    0.008    0.000    0.037    0.000 fromnumeric.py:382(repeat)
      694    0.008    0.000    0.050    0.000 selections.py:27(select)
     4859    0.008    0.000    0.019    0.000 op_def_library.py:772(<listcomp>)
      680    0.008    0.000    0.659    0.001 gen_math_ops.py:4932(unsorted_segment_sum)
      695    0.008    0.000    0.394    0.001 gen_math_ops.py:4485(sqrt)
     8643    0.008    0.000    0.015    0.000 context.py:327(in_eager_mode)
      700    0.008    0.000    0.450    0.001 gen_math_ops.py:2505(maximum)
    12170    0.008    0.000    0.030    0.000 tensor_shape.py:99(assert_is_compatible_with)
     5899    0.008    0.000    0.014    0.000 session.py:1038(_feed_fn)
     6665    0.008    0.000    0.019    0.000 __init__.py:1357(log)
     1719    0.008    0.000    0.041    0.000 uuid.py:600(uuid4)
      694    0.007    0.000    0.008    0.000 selections.py:147(__init__)
    15180    0.007    0.000    0.013    0.000 tf_should_use.py:48(fn)
      710    0.007    0.000    0.603    0.001 gen_math_ops.py:3323(_real_div)
      695    0.007    0.000    2.007    0.003 backend.py:1589(sqrt)
      694    0.007    0.000    0.007    0.000 filters.py:207(get_filters)
       17    0.007    0.000    0.034    0.002 nest.py:226(flatten_dict_items)
      694    0.007    0.000    1.888    0.003 gen_math_ops.py:4552(square)
    27972    0.007    0.000    0.007    0.000 execute.py:86(record_gradient)
     1719    0.007    0.000    0.009    0.000 base.py:43(guess_dtype)
     1412    0.007    0.000    0.037    0.000 session.py:516(<listcomp>)
        1    0.007    0.007    0.007    0.007 files.py:304(flush)
    12002    0.007    0.000    0.009    0.000 op_def_library.py:638(<genexpr>)
    10077    0.007    0.000    0.009    0.000 tensor_shape.py:787(<genexpr>)
     2776    0.007    0.000    0.022    0.000 backend.py:716(int_shape)
     4516    0.007    0.000    0.009    0.000 tensor_shape.py:810(<listcomp>)
     8199    0.007    0.000    0.026    0.000 ops.py:4220(prevent_feeding)
     6145    0.007    0.000    0.029    0.000 fromnumeric.py:50(_wrapfunc)
     6665    0.007    0.000    0.010    0.000 __init__.py:1542(isEnabledFor)
     6605    0.007    0.000    0.028    0.000 session.py:1286(<genexpr>)
      694    0.006    0.000    0.019    0.000 selections.py:429(_handle_simple)
     1720    0.006    0.000    0.008    0.000 group.py:34(__init__)
     5243    0.006    0.000    0.006    0.000 dtypes.py:115(is_numpy_compatible)
      342    0.006    0.000    0.006    0.000 tensor_util.py:845(<listcomp>)
     5899    0.006    0.000    0.016    0.000 session_ops.py:253(_get_handle_feeder)
      699    0.006    0.000    0.037    0.000 ops.py:947(internal_convert_n_to_tensor)
     2115    0.006    0.000    0.006    0.000 ops.py:2611(version)
     1408    0.006    0.000    0.126    0.000 session.py:334(__init__)
        1    0.006    0.006    0.029    0.029 base.py:1790(updates)
     3132    0.006    0.000    0.012    0.000 execute.py:124(make_type)
      704    0.006    0.000    3.355    0.005 backend.py:554(<listcomp>)
     5899    0.006    0.000    0.010    0.000 ops.py:4224(is_feedable)
     1020    0.006    0.000    1.800    0.002 array_ops.py:594(strided_slice)
     3446    0.006    0.000    0.009    0.000 <frozen importlib._bootstrap>:416(parent)
      695    0.006    0.000    0.435    0.001 math_ops.py:452(sqrt)
     2770    0.006    0.000    0.034    0.000 tensor_shape.py:823(__eq__)
     3833    0.006    0.000    0.007    0.000 gradients_impl.py:701(_GetGrads)
       17    0.006    0.000   27.871    1.639 backend.py:2462(__call__)
     9043    0.005    0.000    0.007    0.000 op_def_library.py:165(_MakeBool)
     1388    0.005    0.000    0.717    0.001 init_ops.py:194(__call__)
     3061    0.005    0.000    1.512    0.000 array_ops.py:836(stack)
      694    0.005    0.000    1.930    0.003 math_ops.py:430(square)
      344    0.005    0.000    3.038    0.009 gradients_impl.py:747(_MultiDeviceAddN)
     2530    0.005    0.000    3.349    0.001 tf_should_use.py:106(wrapped)
     2770    0.005    0.000    0.094    0.000 tensor_shape.py:635(with_rank)
     3063    0.005    0.000    0.010    0.000 array_ops.py:938(_get_dtype_from_nested_lists)
    12397    0.005    0.000    0.005    0.000 tensor_util.py:205(_FirstNotNone)
     8093    0.005    0.000    0.010    0.000 ops.py:1051(internal_convert_to_tensor_or_indexed_slices)
     4516    0.005    0.000    0.014    0.000 tensor_shape.py:799(as_list)
     5800    0.005    0.000    0.007    0.000 op_def_library.py:147(_MakeInt)
      341    0.005    0.000    2.086    0.006 math_grad.py:707(_AddGrad)
     2107    0.005    0.000    7.530    0.004 gradients_impl.py:581(<lambda>)
     3481    0.005    0.000    0.025    0.000 tensor_util.py:241(_FilterFloat)
     1719    0.005    0.000    0.006    0.000 attrs.py:51(__init__)
     2530    0.005    0.000    0.005    0.000 tf_should_use.py:52(TFShouldUseWarningWrapper)
      694    0.005    0.000    0.008    0.000 selections.py:406(_expand_ellipsis)
     1390    0.005    0.000    0.564    0.000 backend.py:425(_to_tensor)
    19810    0.005    0.000    0.005    0.000 session.py:708(graph)
     9043    0.005    0.000    0.007    0.000 execute.py:117(make_bool)
      344    0.005    0.000    0.238    0.001 gen_math_ops.py:877(cast)
      340    0.005    0.000    0.552    0.002 gen_array_ops.py:680(_concat_v2)
     1388    0.005    0.000    0.023    0.000 ops.py:3296(add_to_collections)
     4859    0.005    0.000    0.006    0.000 dtypes.py:88(_as_ref)
     5899    0.005    0.000    0.022    0.000 nest.py:114(is_sequence)
     1719    0.005    0.000    0.005    0.000 uuid.py:280(hex)
     1409    0.005    0.000    0.007    0.000 session.py:435(<listcomp>)
      354    0.005    0.000    0.263    0.001 math_ops.py:708(cast)
      694    0.005    0.000    0.013    0.000 selections.py:244(__init__)
     1388    0.004    0.000    0.054    0.000 op_def_library.py:183(_MakeShape)
      340    0.004    0.000    0.229    0.001 gen_array_ops.py:4563(size)
     2107    0.004    0.000    0.012    0.000 gradients_impl.py:265(_VerifyGeneratedGradients)
    12669    0.004    0.000    0.004    0.000 ops.py:414(consumers)
      354    0.004    0.000    0.263    0.001 gen_array_ops.py:513(_broadcast_gradient_args)
      344    0.004    0.000    3.009    0.009 gen_math_ops.py:201(_add_n)
    12671    0.004    0.000    0.004    0.000 {method 'extend' of 'collections.deque' objects}
    15454    0.004    0.000    0.004    0.000 dtypes.py:286(__hash__)
     3480    0.004    0.000    0.014    0.000 ops.py:5058(_assert_collection_is_ok)
     1388    0.004    0.000    3.260    0.002 variables.py:123(__init__)
      340    0.004    0.000    0.496    0.001 gen_array_ops.py:1233(_expand_dims)
     4176    0.004    0.000    0.006    0.000 ops.py:3869(add_op)
     3432    0.004    0.000    0.017    0.000 base.py:258(get_updates_for)
     1396    0.004    0.000    0.006    0.000 ops.py:3847(__enter__)
     1716    0.004    0.000    0.007    0.000 base.py:192(weights)
    17771    0.004    0.000    0.004    0.000 {built-in method builtins.callable}
     2770    0.004    0.000    0.006    0.000 tensor_shape.py:46(__eq__)
     7668    0.004    0.000    0.005    0.000 gradients_impl.py:541(<genexpr>)
     3067    0.004    0.000    0.031    0.000 {method 'all' of 'numpy.ndarray' objects}
     2107    0.004    0.000    0.016    0.000 gradients_impl.py:744(<listcomp>)
     3839    0.004    0.000    0.005    0.000 gradients_impl.py:829(<listcomp>)
     2107    0.004    0.000    0.018    0.000 gradients_impl.py:742(<listcomp>)
     5103    0.004    0.000    0.005    0.000 execute.py:99(make_int)
     1719    0.004    0.000    0.004    0.000 {built-in method from_bytes}
    11772    0.004    0.000    0.004    0.000 base.py:236(id)
      347    0.004    0.000    0.618    0.002 math_grad.py:912(_MatMulGrad)
     1716    0.004    0.000  807.978    0.471 backend.py:2347(batch_get_value)
     2783    0.004    0.000    0.010    0.000 execute.py:110(make_str)
    28332    0.004    0.000    0.004    0.000 {method 'popleft' of 'collections.deque' objects}
     1388    0.004    0.000    0.011    0.000 execute.py:134(make_shape)
      342    0.004    0.000    0.254    0.001 array_ops.py:328(size_internal)
     2783    0.004    0.000    0.008    0.000 op_def_library.py:158(_MakeStr)
    17627    0.004    0.000    0.004    0.000 tensor_shape.py:468(dims)
      688    0.003    0.000    3.022    0.004 math_ops.py:1974(add_n)
     2082    0.003    0.000    0.024    0.000 variables.py:855(name)
     5148    0.003    0.000    0.003    0.000 base.py:154(name)
     1720    0.003    0.000    0.005    0.000 <frozen importlib._bootstrap>:997(_handle_fromlist)
     3446    0.003    0.000    0.003    0.000 {method 'rpartition' of 'str' objects}
     1388    0.003    0.000    0.008    0.000 backend.py:770(dtype)
     5930    0.003    0.000    0.007    0.000 {built-in method builtins.any}
     1396    0.003    0.000    0.006    0.000 ops.py:3857(__exit__)
      691    0.003    0.000    0.010    0.000 ops.py:1086(internal_convert_n_to_tensor_or_indexed_slices)
     1388    0.003    0.000    0.861    0.001 array_ops.py:114(identity)
     5967    0.003    0.000    0.005    0.000 gradients_impl.py:733(_FilterGrad)
     2082    0.003    0.000    1.676    0.001 backend.py:1095(update)
     3833    0.003    0.000    0.003    0.000 ops.py:2776(_is_function)
     1388    0.003    0.000    0.008    0.000 ops.py:360(_shape_as_list)
     1388    0.003    0.000    0.988    0.001 state_ops.py:110(variable_op_v2)
     3063    0.003    0.000    0.014    0.000 array_ops.py:959(_autopacking_conversion_function)
      706    0.003    0.000   10.225    0.014 session.py:1338(_extend_graph)
     6665    0.003    0.000    0.003    0.000 __init__.py:1528(getEffectiveLevel)
      704    0.003    0.000    0.004    0.000 ops.py:4401(get_default_session)
     7394    0.003    0.000    0.003    0.000 variables.py:399(_as_graph_element)
     3060    0.003    0.000    0.007    0.000 tensor_util.py:270(<listcomp>)
        1    0.003    0.003  837.240  837.240 training.py:1039(_fit_loop)
     1396    0.003    0.000    0.048    0.000 ops.py:4287(control_dependencies)
     1408    0.003    0.000    0.096    0.000 session.py:341(<listcomp>)
1730/1046    0.003    0.000    0.009    0.000 tensor_util.py:234(_FilterInt)
        3    0.003    0.001    0.071    0.024 control_flow_ops.py:2907(group)
     2530    0.003    0.000    1.770    0.001 variables.py:1429(is_variable_initialized)
     3259    0.003    0.000    0.004    0.000 ops.py:4232(is_fetchable)
      704    0.003    0.000    0.375    0.001 variables.py:1225(global_variables)
     2776    0.003    0.000    0.003    0.000 backend.py:393(_convert_string_dtype)
     2481    0.003    0.000    0.005    0.000 ops.py:1595(<listcomp>)
        5    0.003    0.001    0.008    0.002 ops.py:1831(<listcomp>)
      704    0.003    0.000    0.372    0.001 ops.py:4831(get_collection)
     1381    0.003    0.000    0.004    0.000 selections.py:475(_translate_slice)
     3432    0.003    0.000    0.003    0.000 base.py:2185(_to_list)
     3259    0.003    0.000    0.003    0.000 session.py:287(build_results)
     8989    0.003    0.000    0.003    0.000 ops.py:1443(<listcomp>)
      706    0.003    0.000  825.312    1.169 session.py:1321(_do_call)
     2107    0.002    0.000    0.002    0.000 {method 'replace' of 'str' objects}
     3836    0.002    0.000    0.003    0.000 ops.py:1855(__len__)
     5899    0.002    0.000    0.019    0.000 session_ops.py:273(_get_handle_mover)
     3259    0.002    0.000    0.007    0.000 session.py:437(_assert_fetchable)
     2530    0.002    0.000    0.002    0.000 tf_should_use.py:55(__init__)
     2779    0.002    0.000    0.004    0.000 op_def_library.py:457(<genexpr>)
     1396    0.002    0.000    0.002    0.000 ops.py:3812(__init__)
        1    0.002    0.002    2.021    2.021 optimizers.py:456(<listcomp>)
        1    0.002    0.002    2.019    2.019 optimizers.py:457(<listcomp>)
     2109    0.002    0.000    0.002    0.000 ops.py:1858(__bool__)
     1388    0.002    0.000    0.003    0.000 init_ops.py:189(__init__)
     2414    0.002    0.000    0.002    0.000 base.py:269(__init__)
     1388    0.002    0.000    0.031    0.000 ops.py:4798(add_to_collections)
     4214    0.002    0.000    0.002    0.000 ops.py:3397(_original_op)
     1388    0.002    0.000    0.003    0.000 variables.py:897(shape)
     2481    0.002    0.000    0.002    0.000 {method 'startswith' of 'bytes' objects}
     1716    0.002    0.000    0.003    0.000 base.py:2219(<listcomp>)
     5899    0.002    0.000    0.003    0.000 backend.py:441(is_sparse)
     3456    0.002    0.000    0.002    0.000 ops.py:3872(op_in_group)
     3068    0.002    0.000    0.004    0.000 ops.py:128(is_dense_tensor_like)
      340    0.002    0.000    0.600    0.002 array_grad.py:478(_ReshapeGrad)
     1388    0.002    0.000    0.003    0.000 ops.py:362(<listcomp>)
      708    0.002    0.000    0.675    0.001 math_ops.py:1261(reduce_sum)
     2774    0.002    0.000    0.005    0.000 ops.py:309(device)
     2109    0.002    0.000    0.003    0.000 gradients_impl.py:203(_AsList)
     1405    0.002    0.000    0.998    0.001 array_ops.py:249(shape)
     5930    0.002    0.000    0.002    0.000 {method 'join' of 'str' objects}
      682    0.002    0.000    0.005    0.000 tensor_shape.py:584(concatenate)
     1716    0.002    0.000    0.002    0.000 base.py:173(trainable_weights)
     3067    0.002    0.000    0.027    0.000 _methods.py:40(_all)
    10125    0.002    0.000    0.002    0.000 {method 'append' of 'collections.deque' objects}
     1716    0.002    0.000    0.006    0.000 base.py:2217(_object_list_uid)
     5899    0.002    0.000    0.002    0.000 session.py:125(<lambda>)
     1388    0.002    0.000    0.002    0.000 tensor_util.py:426(<listcomp>)
      716    0.002    0.000    0.005    0.000 dtypes.py:144(is_floating)
      344    0.002    0.000    0.014    0.000 gradients_impl.py:721(_AccumulatorShape)
     3259    0.002    0.000    0.002    0.000 session.py:124(<lambda>)
      101    0.002    0.000    0.005    0.000 iostream.py:180(schedule)
     2107    0.002    0.000    0.002    0.000 {method 'rstrip' of 'str' objects}
     5307    0.002    0.000    0.002    0.000 variables.py:455(_ref)
        1    0.002    0.002   11.572   11.572 backend.py:2506(gradients)
     1716    0.002    0.000    0.003    0.000 base.py:210(updates)
      699    0.002    0.000    0.006    0.000 op_def_library.py:572(<listcomp>)
      340    0.001    0.000    0.554    0.002 array_ops.py:1030(concat)
       13    0.001    0.000    0.032    0.002 generic_utils.py:275(update)
     1388    0.001    0.000    0.009    0.000 ops.py:366(_shape_tuple)
      694    0.001    0.000    0.001    0.000 selections.py:168(nselect)
      716    0.001    0.000    0.007    0.000 math_ops.py:2293(conj)
     1388    0.001    0.000    0.005    0.000 variables.py:906(get_shape)
     1020    0.001    0.000    0.006    0.000 ops.py:1267(name)
        4    0.001    0.000    0.002    0.000 training.py:39(_standardize_input_data)
     3258    0.001    0.000    0.001    0.000 session.py:351(<listcomp>)
     1396    0.001    0.000    0.002    0.000 ops.py:3878(_pop_control_dependencies_controller)
      694    0.001    0.000    1.931    0.003 backend.py:1565(square)
      694    0.001    0.000    0.002    0.000 dataset.py:519(<genexpr>)
     1412    0.001    0.000    0.038    0.000 session.py:507(_name_list)
     3471    0.001    0.000    0.001    0.000 variables.py:403(_AsTensor)
      694    0.001    0.000    0.002    0.000 {built-in method builtins.sum}
     2815    0.001    0.000    0.001    0.000 gradients_impl.py:679(<listcomp>)
      768    0.001    0.000    0.003    0.000 numerictypes.py:699(issubdtype)
     1720    0.001    0.000    0.001    0.000 {method 'count' of 'list' objects}
     1020    0.001    0.000    0.003    0.000 ops.py:1287(graph)
      694    0.001    0.000    0.002    0.000 gradients_impl.py:689(_GetGrad)
     1716    0.001    0.000    0.001    0.000 base.py:2222(_make_node_key)
        2    0.001    0.001    0.005    0.002 topology.py:715(<listcomp>)
     1381    0.001    0.000    0.001    0.000 {method 'indices' of 'slice' objects}
        1    0.001    0.001    0.003    0.003 topology.py:1259(<listcomp>)
        1    0.001    0.001    0.010    0.010 control_flow_ops.py:1291(MaybeCreateControlFlowState)
     3259    0.001    0.000    0.001    0.000 session.py:284(unique_fetches)
      699    0.001    0.000    0.001    0.000 op_def_library.py:487(<listcomp>)
        2    0.001    0.000    0.038    0.019 backend.py:2435(__init__)
      340    0.001    0.000    0.008    0.000 ops.py:1245(__init__)
     1722    0.001    0.000    0.001    0.000 math_ops.py:1992(<genexpr>)
     4164    0.001    0.000    0.001    0.000 selections.py:163(shape)
     2530    0.001    0.000    0.001    0.000 variables.py:872(initializer)
     1396    0.001    0.000    0.001    0.000 ops.py:3875(_push_control_dependencies_controller)
        1    0.001    0.001    0.002    0.002 gradients_impl.py:281(_StopOps)
     3855    0.001    0.000    0.001    0.000 ops.py:409(value_index)
     2040    0.001    0.000    0.001    0.000 ops.py:1252(values)
     2776    0.001    0.000    0.001    0.000 ops.py:2683(_set_control_flow_context)
     2410    0.001    0.000    0.001    0.000 base.py:103(_lcpl)
     1536    0.001    0.000    0.001    0.000 numerictypes.py:631(issubclass_)
     1388    0.001    0.000    0.001    0.000 dataset.py:520(<genexpr>)
        1    0.001    0.001    1.918    1.918 training.py:1296(_test_loop)
      690    0.001    0.000    0.007    0.000 ops.py:1128(convert_n_to_tensor_or_indexed_slices)
     2075    0.001    0.000    0.001    0.000 selections.py:303(<genexpr>)
       26    0.001    0.000    0.005    0.000 function_base.py:4125(_median)
      694    0.001    0.000    0.001    0.000 variables.py:550(constraint)
     2075    0.001    0.000    0.001    0.000 selections.py:267(<genexpr>)
     1716    0.001    0.000    0.001    0.000 base.py:177(non_trainable_weights)
      344    0.001    0.000    0.261    0.001 math_ops.py:813(to_int32)
     2040    0.001    0.000    0.001    0.000 ops.py:1262(dense_shape)
       26    0.001    0.000    0.001    0.000 {method 'partition' of 'numpy.ndarray' objects}
     1396    0.001    0.000    0.001    0.000 ops.py:3921(<listcomp>)
      340    0.001    0.000    0.496    0.001 array_ops.py:133(expand_dims)
       26    0.001    0.000    0.002    0.000 utils.py:1119(_median_nancheck)
       13    0.001    0.000    0.001    0.000 callbacks.py:228(on_batch_end)
      694    0.001    0.000    0.001    0.000 selections.py:409(<genexpr>)
        1    0.001    0.001    0.001    0.001 variables.py:1380(<listcomp>)
     1391    0.001    0.000    0.001    0.000 dtypes.py:272(name)
      342    0.001    0.000    0.255    0.001 array_ops.py:302(size)
     1408    0.001    0.000    0.001    0.000 session.py:344(unique_fetches)
     1397    0.001    0.000    0.001    0.000 ops.py:3865(control_inputs)
     1391    0.001    0.000    0.001    0.000 variables.py:425(value)
     1388    0.001    0.000    0.001    0.000 filters.py:88(rq_tuple)
        1    0.001    0.001   29.883   29.883 training.py:939(_make_train_function)
        1    0.001    0.001    0.002    0.002 topology.py:723(state_updates)
      354    0.000    0.000    0.004    0.000 tensor_util.py:236(<listcomp>)
      344    0.000    0.000    0.001    0.000 six.py:580(iterkeys)
       40    0.000    0.000    0.002    0.000 _methods.py:53(_mean)
     1409    0.000    0.000    0.000    0.000 session.py:442(fetches)
      340    0.000    0.000    0.001    0.000 ops.py:1282(dtype)
      374    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0x9e3d20}
     1409    0.000    0.000    0.000    0.000 session.py:450(targets)
      680    0.000    0.000    0.000    0.000 ops.py:1257(indices)
       13    0.000    0.000    0.025    0.002 threading.py:533(wait)
     1398    0.000    0.000    0.000    0.000 ops.py:1607(_get_control_flow_context)
       52    0.000    0.000    0.001    0.000 numeric.py:1459(normalize_axis_tuple)
      354    0.000    0.000    0.001    0.000 <string>:16(_make)
      697    0.000    0.000    0.000    0.000 base.py:97(_lapl)
      341    0.000    0.000    0.000    0.000 {method 'tostring' of 'numpy.ndarray' objects}
       26    0.000    0.000    0.001    0.000 numeric.py:1515(moveaxis)
      351    0.000    0.000    0.001    0.000 tensor_shape.py:867(scalar)
     1031    0.000    0.000    0.000    0.000 gradients_impl.py:706(<listcomp>)
        1    0.000    0.000    0.001    0.001 gradients_impl.py:486(<listcomp>)
       40    0.000    0.000    0.002    0.000 fromnumeric.py:2854(mean)
      714    0.000    0.000    0.003    0.000 math_ops.py:1238(_ReductionDims)
     2413    0.000    0.000    0.000    0.000 {built-in method builtins.abs}
       13    0.000    0.000    0.036    0.003 callbacks.py:117(on_batch_end)
        1    0.000    0.000    0.000    0.000 files.py:79(make_fid)
      344    0.000    0.000    0.000    0.000 gradients_impl.py:839(<listcomp>)
       59    0.000    0.000    0.004    0.000 iostream.py:342(write)
       13    0.000    0.000    0.003    0.000 callbacks.py:97(on_batch_begin)
      114    0.000    0.000    0.001    0.000 threading.py:1104(is_alive)
        1    0.000    0.000    0.002    0.002 gradients_impl.py:619(<listcomp>)
       17    0.000    0.000    0.000    0.000 {method 'tolist' of 'numpy.ndarray' objects}
        1    0.000    0.000  867.132  867.132 training.py:1436(fit)
       13    0.000    0.000    0.000    0.000 threading.py:251(_acquire_restore)
       26    0.000    0.000    0.007    0.000 function_base.py:3982(_ureduce)
       13    0.000    0.000    0.000    0.000 threading.py:215(__init__)
        1    0.000    0.000    0.000    0.000 gradients_impl.py:467(<listcomp>)
      696    0.000    0.000    0.000    0.000 array_grad.py:459(_IdGrad)
       17    0.000    0.000    0.268    0.016 training.py:373(_slice_arrays)
       26    0.000    0.000    0.000    0.000 {method 'flatten' of 'numpy.ndarray' objects}
       13    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.arange}
        5    0.000    0.000    0.041    0.008 math_grad.py:733(_MulGrad)
       13    0.000    0.000    0.031    0.002 callbacks.py:302(on_batch_end)
        1    0.000    0.000    0.000    0.000 {method 'shuffle' of 'mtrand.RandomState' objects}
       26    0.000    0.000    0.007    0.000 function_base.py:4037(median)
       13    0.000    0.000    0.024    0.002 threading.py:263(wait)
       40    0.000    0.000    0.000    0.000 _methods.py:43(_count_reduce_items)
        6    0.000    0.000    0.009    0.002 array_ops.py:1404(zeros)
        7    0.000    0.000    0.010    0.001 gen_array_ops.py:4607(_slice)
       59    0.000    0.000    0.000    0.000 {built-in method posix.getpid}
        1    0.000    0.000    0.036    0.036 variables.py:1359(variables_initializer)
        4    0.000    0.000    0.039    0.010 math_grad.py:797(_RealDivGrad)
        3    0.000    0.000    0.069    0.023 array_grad.py:47(_ConcatGradHelper)
       13    0.000    0.000    0.027    0.002 iostream.py:311(flush)
      344    0.000    0.000    0.000    0.000 {method 'keys' of 'dict' objects}
        6    0.000    0.000    0.000    0.000 training.py:221(<listcomp>)
        9    0.000    0.000    0.016    0.002 gen_array_ops.py:1742(fill)
        4    0.000    0.000    0.000    0.000 inspect.py:1079(getfullargspec)
        3    0.000    0.000    0.000    0.000 group.py:158(__getitem__)
       26    0.000    0.000    0.001    0.000 fromnumeric.py:578(partition)
      344    0.000    0.000    0.000    0.000 gradients_impl.py:761(DeviceKey)
      118    0.000    0.000    0.001    0.000 numeric.py:495(asanyarray)
        4    0.000    0.000    0.000    0.000 inspect.py:2092(_signature_from_function)
        8    0.000    0.000    0.006    0.001 gen_math_ops.py:3932(_select)
      114    0.000    0.000    0.000    0.000 threading.py:1062(_wait_for_tstate_lock)
        7    0.000    0.000    0.005    0.001 gen_nn_ops.py:473(bias_add_grad)
       26    0.000    0.000    0.000    0.000 {method 'transpose' of 'numpy.ndarray' objects}
      101    0.000    0.000    0.000    0.000 iostream.py:87(_event_pipe)
       14    0.000    0.000    0.001    0.000 {method 'mean' of 'numpy.generic' objects}
        6    0.000    0.000    0.006    0.001 gen_math_ops.py:2891(_prod)
        2    0.000    0.000    0.048    0.024 control_flow_ops.py:2897(_GroupControlDeps)
       11    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.fromiter}
      344    0.000    0.000    0.000    0.000 gradients_impl.py:751(<lambda>)
       59    0.000    0.000    0.000    0.000 iostream.py:284(_is_master_process)
        1    0.000    0.000  809.055  809.055 topology.py:1035(save_weights)
        3    0.000    0.000    0.033    0.011 math_grad.py:38(_SumGrad)
       13    0.000    0.000    0.000    0.000 callbacks.py:298(on_batch_begin)
       13    0.000    0.000    0.000    0.000 threading.py:498(__init__)
        6    0.000    0.000    0.004    0.001 gen_control_flow_ops.py:539(_switch)
       22    0.000    0.000    0.000    0.000 inspect.py:2431(__init__)
       92    0.000    0.000    0.000    0.000 {built-in method time.time}
        7    0.000    0.000    0.005    0.001 gen_math_ops.py:2744(_neg)
        3    0.000    0.000    0.025    0.008 gen_array_ops.py:4522(shape_n)
        2    0.000    0.000    0.016    0.008 gen_array_ops.py:2827(_pack)
        3    0.000    0.000    0.057    0.019 math_grad.py:95(_MeanGrad)
        6    0.000    0.000    0.005    0.001 control_flow_ops.py:281(switch)
      104    0.000    0.000    0.000    0.000 numeric.py:1506(<genexpr>)
        4    0.000    0.000    0.004    0.001 control_flow_ops.py:373(merge)
        1    0.000    0.000    0.000    0.000 files.py:42(make_fapl)
       59    0.000    0.000    0.000    0.000 iostream.py:297(_schedule_flush)
        2    0.000    0.000    0.039    0.019 backend.py:2481(function)
        4    0.000    0.000    0.003    0.001 gen_control_flow_ops.py:208(_merge)
        4    0.000    0.000    0.013    0.003 control_flow_ops.py:1332(ZerosLikeOutsideLoop)
        5    0.000    0.000    0.004    0.001 gen_math_ops.py:1681(_floor_mod)
        1    0.000    0.000  867.132  867.132 {built-in method builtins.exec}
       52    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.normalize_axis_index}
        5    0.000    0.000    0.004    0.001 gen_math_ops.py:1646(_floor_div)
        4    0.000    0.000    0.000    0.000 {built-in method _warnings.warn}
        2    0.000    0.000    0.006    0.003 training.py:1368(_standardize_user_data)
        1    0.000    0.000  809.056  809.056 callbacks.py:415(on_epoch_end)
        2    0.000    0.000    0.021    0.011 math_grad.py:842(_MaximumMinimumGrad)
        4    0.000    0.000    0.000    0.000 device.py:128(parse_from_string)
        1    0.000    0.000    0.003    0.003 training.py:961(_make_test_function)
       26    0.000    0.000    0.000    0.000 core.py:6190(isMaskedArray)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.copyto}
        2    0.000    0.000    0.000    0.000 training.py:358(_make_batches)
        3    0.000    0.000    0.016    0.005 gen_control_flow_ops.py:283(no_op)
        6    0.000    0.000    0.000    0.000 gradients_impl.py:259(_IsTrainable)
        4    0.000    0.000    0.001    0.000 tf_inspect.py:32(getargspec)
        3    0.000    0.000    0.002    0.001 array_ops.py:387(rank_internal)
        4    0.000    0.000    0.000    0.000 inspect.py:2710(__init__)
        6    0.000    0.000    0.000    0.000 training.py:215(set_of_lengths)
        4    0.000    0.000    0.004    0.001 control_flow_grad.py:34(_SwitchGrad)
        2    0.000    0.000    0.013    0.006 math_grad.py:720(_SubGrad)
        2    0.000    0.000    0.005    0.002 math_ops.py:1164(range)
        1    0.000    0.000    0.006    0.006 gradients_impl.py:207(_DefaultGradYs)
        4    0.000    0.000    0.000    0.000 device.py:213(from_string)
        2    0.000    0.000    0.019    0.010 math_ops.py:2343(reduced_shape)
        3    0.000    0.000    0.069    0.023 array_grad.py:193(_ConcatGradV2)
        3    0.000    0.000    0.027    0.009 array_grad.py:80(_ExtractInputShapes)
        2    0.000    0.000    0.002    0.001 control_flow_ops.py:329(_SwitchRefOrTensor)
        3    0.000    0.000    0.003    0.001 gen_array_ops.py:5600(tile)
        3    0.000    0.000    0.000    0.000 files.py:146(attrs)
      114    0.000    0.000    0.000    0.000 threading.py:506(is_set)
       26    0.000    0.000    0.000    0.000 {method 'insert' of 'list' objects}
        7    0.000    0.000    0.005    0.001 nn_grad.py:237(_BiasAddGrad)
        5    0.000    0.000    0.004    0.001 math_ops.py:1075(floordiv)
        2    0.000    0.000    0.002    0.001 gen_array_ops.py:630(_concat_offset)
        4    0.000    0.000    0.001    0.000 inspect.py:1048(getargspec)
        4    0.000    0.000    0.000    0.000 inspect.py:2173(_signature_from_callable)
        2    0.000    0.000    0.002    0.001 gen_nn_ops.py:4193(_relu_grad)
        2    0.000    0.000    0.003    0.001 gen_data_flow_ops.py:580(dynamic_stitch)
        3    0.000    0.000    0.000    0.000 copy.py:66(copy)
        1    0.000    0.000    0.002    0.002 array_ops.py:893(_autopacking_helper)
        3    0.000    0.000    0.001    0.000 {built-in method builtins.print}
        2    0.000    0.000    0.002    0.001 control_flow_grad.py:89(_MergeGrad)
        1    0.000    0.000    0.002    0.002 gen_state_ops.py:72(assign_add)
        2    0.000    0.000    0.001    0.001 gen_math_ops.py:2850(_pow)
       26    0.000    0.000    0.000    0.000 numeric.py:1577(<listcomp>)
       26    0.000    0.000    0.000    0.000 {built-in method _thread.allocate_lock}
        8    0.000    0.000    0.006    0.001 array_ops.py:2397(where)
        4    0.000    0.000    0.000    0.000 device.py:65(__init__)
        2    0.000    0.000    0.002    0.001 gen_math_ops.py:3229(_range)
        1    0.000    0.000    0.010    0.010 gen_array_ops.py:5111(_split_v)
        1    0.000    0.000    0.000    0.000 {method 'format' of 'str' objects}
        2    0.000    0.000    0.001    0.001 gen_array_ops.py:5946(_zeros_like)
        2    0.000    0.000    0.001    0.001 array_ops.py:1447(zeros_like)
       13    0.000    0.000    0.000    0.000 threading.py:242(__exit__)
        2    0.000    0.000    0.005    0.002 math_grad.py:903(_SelectGrad)
        5    0.000    0.000    0.010    0.002 math_grad.py:33(_safe_shape_div)
        1    0.000    0.000    0.000    0.000 files.py:230(__init__)
        1    0.000    0.000  867.132  867.132 <string>:1(<module>)
        6    0.000    0.000    0.009    0.002 math_ops.py:1414(reduce_prod)
        2    0.000    0.000    0.001    0.001 gen_math_ops.py:3360(reciprocal)
       13    0.000    0.000    0.000    0.000 threading.py:239(__enter__)
        1    0.000    0.000    0.001    0.001 gen_array_ops.py:5157(_squeeze)
        7    0.000    0.000    0.010    0.001 array_ops.py:542(slice)
       22    0.000    0.000    0.000    0.000 inspect.py:2488(annotation)
       26    0.000    0.000    0.000    0.000 inspect.py:2755(<genexpr>)
        2    0.000    0.000    0.000    0.000 copy.py:268(_reconstruct)
        1    0.000    0.000  809.057  809.057 callbacks.py:86(on_epoch_end)
        1    0.000    0.000    0.000    0.000 callbacks.py:287(on_epoch_begin)
        2    0.000    0.000    0.005    0.002 topology.py:713(stateful)
        1    0.000    0.000    0.003    0.003 math_grad.py:383(_Log1pGrad)
        8    0.000    0.000    0.000    0.000 ops.py:3882(_current_control_dependencies)
        2    0.000    0.000    0.000    0.000 device.py:283(_device_function)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.empty}
        2    0.000    0.000    0.000    0.000 {method '__reduce_ex__' of 'object' objects}
        1    0.000    0.000    0.000    0.000 callbacks.py:340(on_epoch_end)
        2    0.000    0.000    0.001    0.001 math_ops.py:523(pow)
        4    0.000    0.000    0.000    0.000 ops.py:3671(device)
        6    0.000    0.000    0.000    0.000 backend.py:313(learning_phase)
       52    0.000    0.000    0.000    0.000 {built-in method _operator.index}
        2    0.000    0.000    0.000    0.000 training.py:203(_check_array_lengths)
        1    0.000    0.000    0.000    0.000 callbacks.py:72(on_epoch_begin)
        1    0.000    0.000    0.000    0.000 files.py:291(<listcomp>)
        2    0.000    0.000    0.000    0.000 training.py:369(<listcomp>)
        2    0.000    0.000    0.000    0.000 training.py:467(_standardize_weights)
        1    0.000    0.000    0.002    0.002 callbacks.py:319(on_epoch_end)
        1    0.000    0.000    0.002    0.002 math_grad.py:374(_LogGrad)
        1    0.000    0.000    0.001    0.001 gen_math_ops.py:1754(greater_equal)
        1    0.000    0.000    0.001    0.001 gen_math_ops.py:4039(_sigmoid_grad)
        2    0.000    0.000    0.000    0.000 shape_base.py:255(expand_dims)
       13    0.000    0.000    0.000    0.000 threading.py:254(_is_owned)
       13    0.000    0.000    0.000    0.000 threading.py:248(_release_save)
       39    0.000    0.000    0.000    0.000 callbacks.py:205(on_batch_begin)
       26    0.000    0.000    0.000    0.000 callbacks.py:208(on_batch_end)
        4    0.000    0.000    0.000    0.000 topology.py:708(uses_learning_phase)
        1    0.000    0.000    0.001    0.001 math_grad.py:596(_SigmoidGrad)
        1    0.000    0.000    0.002    0.002 state_ops.py:220(assign_add)
        2    0.000    0.000    0.000    0.000 ops.py:4242(device)
        1    0.000    0.000    0.000    0.000 compat.py:103(filename_encode)
        1    0.000    0.000    0.000    0.000 training.py:1422(_get_deduped_metrics_names)
        1    0.000    0.000    0.000    0.000 generic_utils.py:261(__init__)
        1    0.000    0.000    0.001    0.001 gen_math_ops.py:2130(less_equal)
        1    0.000    0.000    0.000    0.000 {tensorflow.python.framework.fast_tensor_util.AppendInt64ArrayToTensorProto}
       17    0.000    0.000    0.000    0.000 {built-in method builtins.min}
       13    0.000    0.000    0.000    0.000 {method '__enter__' of '_thread.lock' objects}
        2    0.000    0.000    0.000    0.000 training.py:250(_check_loss_and_target_compatibility)
        1    0.000    0.000    0.000    0.000 callbacks.py:149(on_train_end)
        4    0.000    0.000    0.000    0.000 control_flow_ops.py:401(<listcomp>)
        2    0.000    0.000    0.000    0.000 ops.py:1702(_set_device)
        8    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        2    0.000    0.000    0.002    0.001 nn_grad.py:325(_ReluGrad)
        2    0.000    0.000    0.000    0.000 device.py:175(merge_from)
        1    0.000    0.000    0.000    0.000 ops.py:461(__str__)
        6    0.000    0.000    0.000    0.000 ops.py:494(__iter__)
        2    0.000    0.000    0.000    0.000 ops.py:1305(_device_string)
       22    0.000    0.000    0.000    0.000 {method 'isidentifier' of 'str' objects}
       13    0.000    0.000    0.000    0.000 {method 'release' of '_thread.lock' objects}
        1    0.000    0.000   11.572   11.572 optimizers.py:94(get_gradients)
        4    0.000    0.000    0.000    0.000 topology.py:711(<listcomp>)
        4    0.000    0.000    0.000    0.000 control_flow_ops.py:404(<listcomp>)
        3    0.000    0.000    0.002    0.001 array_ops.py:355(rank)
        1    0.000    0.000    0.010    0.010 array_ops.py:1214(split)
        2    0.000    0.000    0.000    0.000 device.py:255(merge_device)
        1    0.000    0.000    0.000    0.000 numeric.py:1917(isscalar)
        8    0.000    0.000    0.000    0.000 inspect.py:159(isfunction)
       22    0.000    0.000    0.000    0.000 inspect.py:2492(kind)
        1    0.000    0.000    0.000    0.000 {method 'update' of 'set' objects}
       13    0.000    0.000    0.000    0.000 {method '__exit__' of '_thread.lock' objects}
        2    0.000    0.000    0.000    0.000 training.py:996(_check_num_samples)
        1    0.000    0.000    0.000    0.000 callbacks.py:239(on_epoch_end)
        1    0.000    0.000    0.001    0.001 math_grad.py:355(_ExpGrad)
        6    0.000    0.000    0.000    0.000 control_flow_ops.py:1565(pred)
        1    0.000    0.000    0.000    0.000 dtypes.py:150(is_complex)
       11    0.000    0.000    0.000    0.000 ops.py:4507(_GetGlobalDefaultGraph)
        4    0.000    0.000    0.000    0.000 tf_decorator.py:99(unwrap)
        2    0.000    0.000    0.000    0.000 numeric.py:146(ones)
        4    0.000    0.000    0.000    0.000 {method 'values' of 'mappingproxy' objects}
        1    0.000    0.000    0.000    0.000 callbacks.py:64(set_params)
        1    0.000    0.000    0.000    0.000 callbacks.py:139(on_train_begin)
        2    0.000    0.000    0.000    0.000 math_ops.py:1227(<listcomp>)
        4    0.000    0.000    0.000    0.000 control_flow_ops.py:403(<listcomp>)
        2    0.000    0.000    0.000    0.000 control_flow_ops.py:481(_GetOutputContext)
        1    0.000    0.000    0.001    0.001 array_ops.py:2348(squeeze)
        2    0.000    0.000    0.000    0.000 _internal.py:243(__init__)
        1    0.000    0.000    0.000    0.000 callbacks.py:68(set_model)
        3    0.000    0.000    0.000    0.000 callbacks.py:190(__init__)
        1    0.000    0.000    0.000    0.000 callbacks.py:274(__init__)
        1    0.000    0.000    0.002    0.002 backend.py:1099(update_add)
        2    0.000    0.000    0.001    0.001 backend.py:1675(pow)
       12    0.000    0.000    0.000    0.000 control_flow_ops.py:1573(branch)
        1    0.000    0.000    0.000    0.000 tensor_shape.py:460(__str__)
        4    0.000    0.000    0.000    0.000 device.py:99(job)
        4    0.000    0.000    0.000    0.000 device.py:146(<listcomp>)
        2    0.000    0.000    0.000    0.000 device.py:192(to_string)
        4    0.000    0.000    0.000    0.000 <string>:12(__new__)
       30    0.000    0.000    0.000    0.000 inspect.py:2484(default)
        2    0.000    0.000    0.000    0.000 training.py:1406(<listcomp>)
        1    0.000    0.000    0.000    0.000 callbacks.py:56(__init__)
        4    0.000    0.000    0.000    0.000 training.py:149(_standardize_sample_or_class_weights)
        1    0.000    0.000    0.001    0.001 math_grad.py:262(_NegGrad)
        8    0.000    0.000    0.000    0.000 math_ops.py:1225(<genexpr>)
        2    0.000    0.000    0.000    0.000 tensor_shape.py:545(num_elements)
        3    0.000    0.000    0.000    0.000 ops.py:3233(_last_id)
        4    0.000    0.000    0.000    0.000 device.py:88(_clear)
        4    0.000    0.000    0.000    0.000 device.py:95(job)
        4    0.000    0.000    0.000    0.000 device.py:110(replica)
        4    0.000    0.000    0.000    0.000 device.py:121(task)
        2    0.000    0.000    0.000    0.000 ops.py:1027(convert_to_tensor_or_indexed_slices)
       44    0.000    0.000    0.000    0.000 inspect.py:2480(name)
        2    0.000    0.000    0.000    0.000 {method 'setdefault' of 'dict' objects}
        8    0.000    0.000    0.000    0.000 {method 'upper' of 'str' objects}
        2    0.000    0.000    0.000    0.000 {built-in method posix.fspath}
        4    0.000    0.000    0.000    0.000 callbacks.py:196(set_model)
        4    0.000    0.000    0.000    0.000 training.py:165(<listcomp>)
        2    0.000    0.000    0.000    0.000 training.py:198(_standardize_sample_weights)
        1    0.000    0.000    0.001    0.001 backend.py:1060(cast)
        1    0.000    0.000    0.000    0.000 gradients_impl.py:485(<listcomp>)
        1    0.000    0.000    0.011    0.011 math_grad.py:860(_MaximumGrad)
        1    0.000    0.000    0.010    0.010 math_grad.py:866(_MinimumGrad)
        2    0.000    0.000    0.000    0.000 math_grad.py:988(_FloorGrad)
        4    0.000    0.000    0.000    0.000 control_flow_ops.py:398(<listcomp>)
        4    0.000    0.000    0.000    0.000 control_flow_ops.py:1314(IsSwitch)
        4    0.000    0.000    0.000    0.000 tf_inspect.py:44(<genexpr>)
        2    0.000    0.000    0.000    0.000 copyreg.py:87(__newobj__)
        1    0.000    0.000    0.000    0.000 training.py:1600(<listcomp>)
        1    0.000    0.000    0.000    0.000 callbacks.py:58(<listcomp>)
        1    0.000    0.000    0.000    0.000 callbacks.py:159(__iter__)
        4    0.000    0.000    0.000    0.000 callbacks.py:193(set_params)
        4    0.000    0.000    0.000    0.000 callbacks.py:214(on_train_end)
        1    0.000    0.000    0.000    0.000 callbacks.py:224(on_epoch_begin)
        1    0.000    0.000    0.000    0.000 callbacks.py:283(on_train_begin)
        1    0.000    0.000    0.000    0.000 callbacks.py:336(on_train_begin)
        2    0.000    0.000    0.000    0.000 training.py:193(_standardize_class_weights)
        3    0.000    0.000    0.000    0.000 backend.py:139(floatx)
        1    0.000    0.000    0.000    0.000 gradients_impl.py:310(<genexpr>)
        1    0.000    0.000    0.000    0.000 gen_array_ops.py:5198(<listcomp>)
        1    0.000    0.000    0.000    0.000 op_def_library.py:710(<listcomp>)
        1    0.000    0.000    0.000    0.000 tensor_shape.py:42(__str__)
        1    0.000    0.000    0.000    0.000 {method 'SetInParent' of 'google.protobuf.pyext._message.CMessage' objects}
        1    0.000    0.000    0.000    0.000 files.py:178(fid)
        1    0.000    0.000    0.000    0.000 files.py:290(<listcomp>)
        4    0.000    0.000    0.000    0.000 inspect.py:2785(parameters)
        1    0.000    0.000    0.000    0.000 os.py:794(fsencode)
        2    0.000    0.000    0.000    0.000 callbacks.py:199(on_epoch_begin)
        2    0.000    0.000    0.000    0.000 callbacks.py:211(on_train_begin)
        1    0.000    0.000    0.000    0.000 backend.py:93(backend)
        1    0.000    0.000    0.000    0.000 gradients_impl.py:487(<listcomp>)
        1    0.000    0.000    0.000    0.000 tensor_shape.py:66(__int__)
        4    0.000    0.000    0.000    0.000 device.py:106(replica)
        4    0.000    0.000    0.000    0.000 device.py:117(task)
        2    0.000    0.000    0.000    0.000 _internal.py:268(get_data)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        4    0.000    0.000    0.000    0.000 inspect.py:2789(return_annotation)

As you can see the save_weights function in the topology.py is what taking so much time,I ran the debugger to find this far,but I don't know how to fix this.
Someone should Fix this soon, the model that I ran is not even big enough to run slow on a CPU machine but yet it takes huge time to save the 1st epoch model.

And this happens to me only in the 1st epoch,I don't know whether it is specific to usage of the embedding layers or Any layer, because sometimes I felt saving 1st epoch took more time than the later ones but I didn't care because even though they were slow, they ran within seconds, but this time it took 867seconds which is tremendously slow to be run on a GPU for such a simple model.

So no replies at all for this from the keras team?

I'm facing the same problem with the latest version of Keras + tensorflow 1.8. If the model we try to save has many horizontal layers that are then concatenated the saving at the end of the first epoch is extremely slow even if the number of params is quite constrained ( around 1.5 mil). Another model with a few layers but 10 times more params takes only a few seconds to save

Exactly it has nothing to do with the number of parameters, anyone managed to do any work arounds? Please do post here if you manage to solve it.

I am also having this problem -- many concatenated layers and super-slow saving to hdf5 during checkpointing. I'm using the R interface to Keras with TF

Same issue here. Running latest version with tensorflow gpu version and it takes around 5 minutes to save a model

It is a real shame that when we want to use keras for non-trivial datasets which are not images and language, we are not given any help

Hi I'll try to take a look at it,
We could work on this better if you could provide a standalone reproducible example.
Some advices:

  • Don't use custom data or custom paths.
  • Use random arrays or even np.ones, np.zeros.
  • The example should run with Keras (and deps) alone.
  • Should be Python3 compatible.
  • Should not be OS specific.
  • The file should reproduce the bug with *high fidelity.
  • Link to a gist would be appreciated.

Thanks!
Dref360

I do not have much time to investigate but I did try my best to trigger the behavior in a Jupyter Notebook. It appears to only affect the first save. Make sure the kernel is completely killed and launch it on a freshly instantiated notebook.

import random
from multiprocessing import cpu_count
import numpy as np
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint, CSVLogger

num_classes = 5
model = InceptionResNetV2(weights=None, include_top=True, classes=5)
model.compile(
    optimizer='sgd',
    loss='categorical_crossentropy',
    metrics=['accuracy'])

def data_generator():
    batch_size = 8
    while True:
        x = np.random.rand(batch_size, 299, 299, 3,)
        a = np.array([random.choice(range(num_classes)) for _ in range(batch_size)])
        y = np.zeros((batch_size, num_classes))
        y[np.arange(batch_size), a] = 1
        yield x, y
train_generator = data_generator()
validation_generator = data_generator()
csvl = CSVLogger('deleteme.csv')
mc = ModelCheckpoint('model_deleteme.h5', save_best_only=False, verbose=1)
es = EarlyStopping(patience=2, verbose=1)
rlrop = ReduceLROnPlateau(patience=1, verbose=1)
model.fit_generator(
    train_generator, epochs=25,
    steps_per_epoch=3,
    validation_data=validation_generator,
    validation_steps=3,
    use_multiprocessing=True,
    workers=cpu_count(),
    verbose=1,
    callbacks = [mc, rlrop, es, csvl])

model.save('model_deleteme.h5')

Please confirm reproducibility.

FYI: I came across the same issue when using DenseNet201 from keras.applications

I am having the same issue but not using Rnns, just transfer learning various Imagenet
Windows 10

`

keras.__version__
'2.2.4'
import tensorflow as tf; tf._version__

'1.9.0'

>
`

Same issue here, keras2.1.5~2.2.4. Only appear at the first opech, after that, the saving speed is much faster.

For me. I have exactly this issue. These quotes for example I could have written:

This:
"I am also having this problem -- many concatenated layers and super-slow saving to hdf5 during checkpointing. "
This:
"It appears to only affect the first save."

Been thumbing through the save logic to see where it might be. I think the h5 dict gets re-used between checkpoints, so it might be an inefficient use of h5 in certain model configurations?

I been thinking about how I can trick new experiments into using an old h5 file on checkpointing to avoid this time out.

Been dealing with it for about a year, but was hesitant to make a post without a big well known research model causing it, but someone above claims DenseNet does it, so there is that now.

Same problem here (DenseNet121 with keras 2.2.4).

For debugging purposes, I've profiled the call to Model.save. It's seems the slowdown is related to _pywrap_tensorflow_internal.TF_SessionRun_wrapper call, here is the stats ordered by cumtime :

       1434473 function calls (1404566 primitive calls) in 205.269 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    0.000    0.000  205.269  102.634 interactiveshell.py:2852(run_code)
        2    0.000    0.000  205.269  102.634 {built-in method builtins.exec}
        1    0.000    0.000  205.269  205.269 <ipython-input-28-c18c96b1252f>:6(<module>)
        1    0.000    0.000  205.269  205.269 network.py:1051(save)
        1    0.001    0.001  205.269  205.269 saving.py:335(save_model)
        1    0.090    0.090  205.248  205.248 saving.py:27(_serialize_model)
      247    0.002    0.000  203.305    0.823 session.py:840(run)
      247    0.007    0.000  203.303    0.823 session.py:1084(_run)
      247    0.003    0.000  203.194    0.823 session.py:1310(_do_run)
      247    0.001    0.000  203.190    0.823 session.py:1354(_do_call)
      247    0.001    0.000  203.189    0.823 session.py:1337(_run_fn)
      247    0.001    0.000  203.173    0.823 session.py:1425(_call_tf_sessionrun)
      247  203.172    0.823  203.172    0.823 {built-in method _pywrap_tensorflow_internal.TF_SessionRun_wrapper}
      430    0.010    0.000  200.622    0.467 tensorflow_backend.py:2410(batch_get_value)
        1    0.000    0.000    3.375    3.375 optimizers.py:516(get_config)
        4    0.000    0.000    3.375    0.844 tensorflow_backend.py:2398(get_value)
        4    0.000    0.000    3.364    0.841 variables.py:1879(eval)
        4    0.000    0.000    3.364    0.841 ops.py:710(eval)
        4    0.000    0.000    3.364    0.841 ops.py:5542(_eval_using_default_session)
     1776    0.042    0.000    0.998    0.001 io_utils.py:200(__setitem__)
      247    0.140    0.001    0.680    0.003 tensorflow_backend.py:156(get_session)
...

I am also having either the same or a related problem.

I have a network with hundreds of small layers (total trainable parameters less than 400k), with many layers concatenated at the end (similar to other people in the thread). For me, even generating the Model object with keras takes well over an hour. Most of the time is spent in _pywrap_tensorflow_internal.TF_SessionRun_wrapper as in the previous comment.

Using mask rcnn, saving the first checkpoint is very slow, no problem with the laters.

Why can't spawn a separate thread for saving models? Just saying...

same issue here

I also faced the same problem when working with an Unet with Densenet201 backbone (see segmentation_models) and Keras 2.2.5. As far as I understood the problem is with Keras' save/save_weights function which is EXTREMELY slow when generating the model file (for the first time?). I base my assumption on the fact that it seems to be able to extract the weights into a numpy array with .get_weights() function and save them e.g. as .npy file.

Having this in mind, I ended up writing my own checkpointer callback. I will share it here in case someone else should find it useful. I hope the issue will get fixed soon before all the keras-folks escape into the fastai-camp!

from keras.callbacks import Callback
class Checkpoints(Callback):
    def __init__(self, filepath, monitor='val_loss', verbose=0, mode='auto'):
        super(Callback, self).__init__()
        self.monitor = monitor
        self.filepath = filepath
        self.verbose = verbose

        if mode not in ['auto', 'min', 'max']:
          logging.warning('ModelCheckpoint mode %s is unknown, '
                          'fallback to auto mode.', mode)
          mode = 'auto'


        if mode == 'min':
          self.monitor_op = np.less
          self.best = np.Inf
        elif mode == 'max':
          self.monitor_op = np.greater
          self.best = -np.Inf
        else:
          if 'acc' in self.monitor or self.monitor.startswith('fmeasure'):
            self.monitor_op = np.greater
            self.best = -np.Inf
          else:
            self.monitor_op = np.less
            self.best = np.Inf

    def set_model(self, model):
      self.model = model

    def on_epoch_end(self, epoch, logs={}):
        current = logs.get(self.monitor)
        if current is None:
          logging.warning('Can save best model only with %s available, '
                          'skipping.', self.monitor)
        else:
          if self.monitor_op(current, self.best):
             if self.verbose > 0:
                print('\nEpoch %05d: %s improved from %0.5f to %0.5f,'
                      ' saving model to %s' % (epoch + 1, self.monitor, 
                                               self.best,
                                               current, filepath))
             self.best = current
             np.save(self.filepath, self.model.get_weights()) 
          else:
            if self.verbose > 0:
              print('\nEpoch %05d: %s did not improve from %0.5f' %
                    (epoch + 1, self.monitor, self.best))            

Usage:

...
checkpoints = Checkpoints("best.npy"), verbose=1)
...
model.fit_generator(train_gen,
                    steps_per_epoch = len(train_gen),
                    validation_data = valid_gen,
                    validation_steps = len(valid),
                    epochs = EPOCHS, 
                    callbacks = checkpoints)

...
weights = np.load("best.npy", allow_pickle=True)
model.set_weights(weights)

Having the same issue here as well, and it is not only 1st save affected, but all sequential... Wow the issue is reported 2 years ago... and no attention? keras team, c'mon.

Same issue, using pre-trained NASNet - both saving and loading is very slow(5 minutes). Keras 2.3.1

Was this page helpful?
0 / 5 - 0 ratings