Rasa version:
Rasa 1.10.1
Python version:
Python 3.7.6
Operating system (windows, osx, ...):
Consistent on osx and linux.
For example on OSX 10.14.6 using 1.7 GHz Intel Core i7 16 GB 2133 MHz LPDDR3
Issue:
We use the memoization policy (no ML-based policy), have a domain with a few hundred slots, the vast majority is unfeaturized. Training takes about 15 minutes. We ran profiling, and noticed that a lot of time is spent simply copying the Domain dictionary. We have observed in the past similar issues at runtime, where time is spent copying the domain dictionary simply to load the tracker in memory.
It seems like this is done because the domain is mutated with the actual values of the slots. Perhaps a better implementation would avoid mutating the domain, and keep values in a separate structure.
Error (including full traceback):
Here's the profiling data
stats.zip
You can parse it using :
p = pstats.Stats('stats')
p.sort_stats('cumulative').print_stats(100)
Here's the output
>>> p.sort_stats('cumulative').print_stats(100)
Thu Jun 18 13:44:27 2020 stats
7606471166 function calls (6713924733 primitive calls) in 2791.514 seconds
Ordered by: cumulative time
List reduced from 1374 to 100 due to restriction <100>
ncalls tottime percall cumtime percall filename:lineno(function)
4 0.005 0.001 5542.777 1385.694 train.py:23(train)
1 0.000 0.000 2791.515 2791.515 {built-in method builtins.exec}
1 0.000 0.000 2791.515 2791.515 <string>:1(<module>)
1 0.019 0.019 2791.514 2791.514 {method 'run_until_complete' of 'uvloop.loop.Loop' objects}
10 0.031 0.003 2770.536 277.054 train.py:55(train_async)
9 0.001 0.000 2770.418 307.824 train.py:118(_train_async_internal)
4 0.004 0.001 2752.328 688.082 train.py:205(_do_training)
4 1.061 0.265 2752.324 688.081 train.py:332(_train_core_with_validated_data)
3 0.036 0.012 2655.368 885.123 agent.py:615(load_data)
3 36.174 12.058 2655.332 885.111 __init__.py:35(load_data)
2 9.935 4.968 2619.143 1309.572 generator.py:195(generate)
1480 2.050 0.001 2600.574 1.757 generator.py:502(_process_step)
192965 5.976 0.000 2298.746 0.012 generator.py:84(copy)
196695 0.539 0.000 2194.473 0.011 generator.py:74(init_copy)
196697 0.456 0.000 2193.916 0.011 generator.py:46(__init__)
196697 1.112 0.000 2193.460 0.011 trackers.py:114(__init__)
196697 52.468 0.000 2166.356 0.011 trackers.py:135(<dictcomp>)
963628955/71794526 918.577 0.000 2114.312 0.000 copy.py:132(deepcopy)
71811240/71802852 174.158 0.000 1740.057 0.000 copy.py:268(_reconstruct)
71823031/71803382 262.444 0.000 1179.439 0.000 copy.py:236(_deepcopy_dict)
18453936 12.414 0.000 493.042 0.000 generator.py:113(update)
1161527 6.041 0.000 331.654 0.000 generator.py:106(_append_current_state)
1163393 2.001 0.000 325.754 0.000 domain.py:694(get_active_states)
1163393 207.800 0.000 319.673 0.000 domain.py:621(get_parsing_states)
2044613317 188.622 0.000 188.622 0.000 {method 'get' of 'dict' objects}
146792854 130.723 0.000 160.374 0.000 copy.py:252(_keep_alive)
18453936 14.887 0.000 129.547 0.000 trackers.py:465(update)
143622366 30.846 0.000 119.947 0.000 copy.py:273(<genexpr>)
1329017704 104.614 0.000 104.614 0.000 {built-in method builtins.id}
349158 0.485 0.000 88.073 0.000 __init__.py:828(apply_to)
2 0.012 0.006 85.022 42.511 agent.py:660(train)
2 0.009 0.005 85.001 42.500 ensemble.py:116(train)
2 0.267 0.133 74.798 37.399 ensemble.py:67(_training_events_from_trackers)
816836101 69.253 0.000 69.253 0.000 copy.py:190(_deepcopy_atomic)
545855 1.613 0.000 67.244 0.000 trackers.py:554(_reset)
545855 34.013 0.000 60.251 0.000 trackers.py:565(_reset_slots)
74321166 54.157 0.000 54.157 0.000 {method 'items' of 'dict' objects}
16287502 37.387 0.000 48.399 0.000 slots.py:218(as_feature)
349158 4.713 0.000 46.085 0.000 trackers.py:412(replay_events)
71811240 37.090 0.000 37.090 0.000 {method '__reduce_ex__' of 'object' objects}
71802852 26.244 0.000 36.611 0.000 copyreg.py:87(__newobj__)
227780667/227780666 31.961 0.000 32.024 0.000 {built-in method builtins.isinstance}
19106920 11.743 0.000 27.440 0.000 __init__.py:1041(apply_to)
199237075 26.131 0.000 26.131 0.000 slots.py:65(reset)
145847684 25.925 0.000 25.925 0.000 {built-in method builtins.getattr}
3158640/3148225 7.254 0.000 25.041 0.000 copy.py:210(_deepcopy_list)
371122367 24.844 0.000 24.844 0.000 slots.py:187(as_feature)
74180950 24.241 0.000 24.241 0.000 {method 'update' of 'dict' objects}
13960716 20.176 0.000 22.337 0.000 slots.py:128(as_feature)
352752 11.608 0.000 21.672 0.000 trackers.py:382(applied_events)
6 0.000 0.000 20.144 3.357 rasa.py:63(get_domain)
6 0.000 0.000 20.135 3.356 domain.py:88(load)
6 0.000 0.000 20.014 3.336 domain.py:105(from_path)
6 0.000 0.000 20.013 3.336 domain.py:121(from_file)
6 0.001 0.000 20.012 3.335 domain.py:125(from_yaml)
14 0.000 0.000 19.788 1.413 io.py:102(read_yaml)
20 0.000 0.000 19.768 0.988 constructor.py:108(get_single_data)
14 0.006 0.000 19.673 1.405 main.py:328(load)
20 0.000 0.000 19.637 0.982 composer.py:70(get_single_node)
20 0.000 0.000 19.632 0.982 composer.py:95(compose_document)
24978/20 0.093 0.000 19.632 0.982 composer.py:109(compose_node)
4990/20 0.046 0.000 19.630 0.982 composer.py:194(compose_mapping_node)
297943 0.130 0.000 19.157 0.000 generator.py:55(past_states)
1866 0.007 0.000 19.028 0.010 trackers.py:191(past_states)
1866 0.049 0.000 19.017 0.010 domain.py:701(states_for_tracker_history)
1866 0.005 0.000 18.968 0.010 domain.py:706(<listcomp>)
3732 0.010 0.000 18.443 0.005 trackers.py:308(generate_all_prior_trackers)
72024127 17.952 0.000 17.952 0.000 {method 'match' of 're.Pattern' objects}
63 0.044 0.001 17.638 0.280 tarfile.py:234(copyfileobj)
2 0.000 0.000 16.988 8.494 model.py:442(package_model)
2 0.001 0.001 16.988 8.494 model.py:234(create_package_rasa)
52/6 0.002 0.000 16.913 2.819 tarfile.py:1905(add)
52 0.001 0.000 16.809 0.323 tarfile.py:1956(addfile)
14444 0.108 0.000 16.623 0.001 gzip.py:247(write)
14442 16.046 0.001 16.046 0.001 {method 'compress' of 'zlib.Compress' objects}
19586 0.059 0.000 15.703 0.001 composer.py:142(compose_scalar_node)
109841420 15.565 0.000 15.565 0.000 {method 'append' of 'list' objects}
24978 4.685 0.000 15.553 0.001 resolver.py:357(resolve)
19106920 9.615 0.000 12.929 0.000 trackers.py:218(set_latest_action_name)
73924490 11.684 0.000 11.684 0.000 {built-in method builtins.hasattr}
2 0.001 0.001 10.860 5.430 agent.py:782(persist)
2 0.001 0.000 10.792 5.396 domain.py:785(persist)
2 0.000 0.000 10.785 5.392 utils.py:204(dump_obj_as_yaml_to_file)
2 0.000 0.000 10.785 5.392 io.py:228(write_yaml_file)
2 0.000 0.000 10.784 5.392 main.py:1208(dump)
2 0.002 0.001 10.784 5.392 main.py:1136(dump_all)
2 0.000 0.000 10.774 5.387 representer.py:83(represent)
2 0.001 0.001 10.734 5.367 serializer.py:102(serialize)
7292/2 0.058 0.000 10.723 5.362 serializer.py:155(serialize_node)
150 0.000 0.000 10.504 0.070 rasa.py:42(get_stories)
71804979 10.368 0.000 10.368 0.000 {built-in method __new__ of type object at 0x104f64610}
13602 3.067 0.000 10.210 0.001 resolver.py:271(resolve)
4 0.007 0.002 10.188 2.547 memoization.py:145(train)
6 0.002 0.000 9.936 1.656 validation.py:15(validate_yaml_schema)
4 0.244 0.061 8.935 2.234 featurizers.py:594(training_states_and_actions)
44196 0.228 0.000 8.889 0.000 std.py:1383(set_postfix)
72004254 8.827 0.000 8.827 0.000 {built-in method builtins.issubclass}
45501 0.104 0.000 8.782 0.000 std.py:1314(refresh)
45515 0.091 0.000 8.363 0.000 std.py:1444(display)
10440662 3.927 0.000 7.623 0.000 __init__.py:1160(apply_to)
We suspect this could be the issue https://github.com/RasaHQ/rasa/blob/850344ff457ab1c3bfbb37102fe21da55804ede3/rasa/core/trackers.py#L135
Additionally, this fails on Circle CI, probably due to lack of memory. Note that this is using a docker executor with 16GB of memory, and waiting 30 minutes for training. We are not yet sure of the exact cause of this failure, but what we know is that removing a dozen unfeaturized slots for the domain trains in about 15 minutes without issues.
Too long with no output (exceeded 30m0s): context deadline exceeded
Command or request that led to error:
rasa train -d data/domain.yml --out models -c config.yml
Content of configuration file (config.yml) (if relevant):
Content of domain file (domain.yml) (if relevant):
Cannot attach due to IP, but contains about 400 hundred slots, and only a handful are featurized ( but that's not the issue here). We did reproduce this in the past with only unfeaturized slots, the main variable is the number of slots.
@edouardlp thanks a lot for raising this - and especially for the detailed profiling. excessive deepcopying in preparing the trackers for training might not be necessary. we'll look into this 馃憤
Hi, @ricwo @edouardlp I think we can avoid this issue by customizing the copying behavior through implementation of the memo dictionary of an object already copied during current copying pass. The memo dictionary is used to keep track of the values that have been copied already, so as to avoid infinite recursion and hence avoids to fails in machines with low memory.
def __deepcopy__(self, memo):
print('__deepcopy__({})'.format(memo))
return MyClass(copy.deepcopy(self.name, memo))`
Thanks,
Your feedback would be extremely welcome.
Most helpful comment
@edouardlp thanks a lot for raising this - and especially for the detailed profiling. excessive deepcopying in preparing the trackers for training might not be necessary. we'll look into this 馃憤