Transformers: [testing] the test suite is many times slower than 2 weeks ago

Created on 18 Oct 2020 · 21Comments · Source: huggingface/transformers

We are going to have a CI-side running reports when this is merged https://github.com/huggingface/transformers/pull/7884, but we can already start looking at what caused a 4-5 times slowdown in the test suite about 10 days ago. I'm not sure the exact moment, but I checked a few reports and it appears that the change happened around Oct 8th +/- a few days.
e.g. before:
https://app.circleci.com/pipelines/github/huggingface/transformers/13323/workflows/5984ea0e-e280-4a41-bc4a-b4a3d72fc411/jobs/95699
after:
https://app.circleci.com/pipelines/github/huggingface/transformers/13521/workflows/d235c864-66fa-4408-a787-2efab850a781/jobs/97329

@sshleifer suggested a diagnostic to resolve this by adding a pytorch --durations=N flag, except if it's a missing @slow it won't work on my machine because I already have all the models pre-downloaded, so the following is just the slow execution:

Here is the report on my machine running all tests normally

$ pytest -n 3 --durations=0 tests
[...]
76.92s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_train_pipeline_custom_model
54.38s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_graph_mode
49.85s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_compile_tf_model
48.98s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_save_pretrained
44.11s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_compile_tf_model
38.42s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_graph_mode
35.94s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_tokenization_python_rust_equals
35.86s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_create_token_type_ids
35.81s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_embeded_special_tokens
35.58s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_max_length_equal
35.54s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_padding
35.36s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_is_fast
35.14s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_pretokenized_inputs
35.10s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_special_tokens_map_equal
35.07s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_num_special_tokens_to_add_equal
35.02s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_build_inputs_with_special_tokens
34.94s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_prepare_for_model
31.60s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_compile_tf_model
31.03s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_train_pipeline_custom_model
29.11s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_compile_tf_model
29.10s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_train_pipeline_custom_model
27.62s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_pt_tf_model_equivalence
26.36s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_compile_tf_model
25.12s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_save_pretrained
24.85s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_graph_mode
24.66s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_compile_tf_model
24.04s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_tokenization_python_rust_equals
23.15s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained
23.10s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_graph_mode
23.08s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_padding
22.99s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_compile_tf_model
22.78s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_train_pipeline_custom_model
22.69s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_keras_save_load
22.67s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_train_pipeline_custom_model
22.43s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_pretokenized_inputs
22.38s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_create_token_type_ids
22.35s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_prepare_for_model
22.28s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_attentions
22.25s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_max_length_equal
22.19s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_embeded_special_tokens
22.06s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_special_tokens_map_equal
21.95s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_is_fast
21.92s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_build_inputs_with_special_tokens
21.85s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_num_special_tokens_to_add_equal
21.61s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_graph_mode
21.49s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_add_special_tokens
21.32s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_add_tokens
21.21s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_alignement_methods
21.09s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_batch_encode_dynamic_overflowing
21.06s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_fast_only_inputs
20.95s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript
20.86s call     tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_offsets_mapping
20.06s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_model_outputs_equivalence
20.01s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_train_pipeline_custom_model
19.62s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_attention_outputs
19.39s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript_output_attentions
18.78s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_save_pretrained
18.63s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_train_pipeline_custom_model
18.36s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_compile_tf_model
18.08s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_tokenization_python_rust_equals
17.85s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_save_load
17.54s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_train_pipeline_custom_model
17.39s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_train_pipeline_custom_model
17.28s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_embeded_special_tokens
17.25s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_special_tokens_map_equal
16.88s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_train_pipeline_custom_model
16.84s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_graph_mode
16.74s call     tests/test_modeling_electra.py::ElectraModelTest::test_torchscript
16.73s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_padding
16.63s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_max_length_equal
16.56s call     tests/test_modeling_fsmt.py::FSMTModelTest::test_lm_head_model_random_beam_search_generate
16.55s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_hidden_state
16.53s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_build_inputs_with_special_tokens
16.53s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_is_fast
16.49s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_create_token_type_ids
16.45s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_num_special_tokens_to_add_equal
16.43s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_graph_mode
16.42s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_prepare_for_model
16.09s call     tests/test_tokenization_albert.py::AlbertTokenizationTest::test_pretokenized_inputs
16.02s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_model_outputs_equivalence
15.80s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_train_pipeline_custom_model
15.50s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_translation
15.30s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_train_pipeline_custom_model
15.00s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_graph_mode
14.96s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_compile_tf_model
14.07s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_train_pipeline_custom_model
14.03s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_pt_tf_model_equivalence
13.77s call     tests/test_modeling_rag.py::RagDPRT5Test::test_model_generate
13.29s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_hidden_states_output
13.10s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_model_outputs_equivalence
12.69s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_model_outputs_equivalence
12.43s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_save_pretrained
11.76s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_compile_tf_model
11.73s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_graph_mode
11.66s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_pt_tf_model_equivalence
11.63s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_special_tokens_map_equal
11.60s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_batch_encode_dynamic_overflowing
11.51s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_add_special_tokens
11.50s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_compile_tf_model
11.36s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_prepare_for_model
11.34s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_add_tokens
11.23s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_tokenization_python_rust_equals
11.19s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_fast_only_inputs
11.17s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_offsets_mapping
11.09s call     tests/test_benchmark_tf.py::TFBenchmarkTest::test_inference_no_configs_xla
11.05s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_alignement_methods
11.04s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_is_fast
10.95s call     tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_offsets_with_special_characters
10.81s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_embeded_special_tokens
10.71s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_max_length_equal
10.59s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_build_inputs_with_special_tokens
10.59s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_num_special_tokens_to_add_equal
10.56s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_pt_tf_model_equivalence
10.42s call     tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_hidden_state
10.39s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_create_token_type_ids
10.36s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_compile_tf_model
10.34s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_attention_outputs
10.31s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_model_outputs_equivalence
10.25s call     tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript
10.15s call     tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_attentions
9.78s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_pt_tf_model_equivalence
9.76s call     tests/test_benchmark_tf.py::TFBenchmarkTest::test_train_with_configs
9.76s call     tests/test_modeling_bert.py::BertModelTest::test_model_outputs_equivalence
9.72s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript
9.50s call     tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_hidden_state
9.37s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_pt_tf_model_equivalence
9.31s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_batch_encode_dynamic_overflowing
9.31s call     tests/test_modeling_albert.py::AlbertModelTest::test_model_outputs_equivalence
9.30s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_save_load
9.10s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_model_outputs_equivalence
9.01s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_keras_save_load
8.88s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_add_tokens
8.81s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_offsets_mapping
8.80s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_keras_save_load
8.73s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_add_special_tokens
8.66s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_fast_only_inputs
8.60s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_pt_tf_model_equivalence
8.59s call     tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_alignement_methods
8.57s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_model_outputs_equivalence
8.50s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_compile_tf_model
8.34s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_summarization
8.33s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_lm_head_model_random_beam_search_generate
8.17s call     tests/test_modeling_albert.py::AlbertModelTest::test_torchscript
8.01s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_model_outputs_equivalence
7.96s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_model_outputs_equivalence
7.88s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_attention_outputs
7.87s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_graph_mode
7.84s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_attention_outputs
7.84s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript
7.58s call     tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_attentions
7.57s call     tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_train_pipeline_custom_model
7.51s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_save_load
7.47s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_attention_outputs
7.46s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_keyword_and_dict_args
7.36s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_save_load
7.36s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_determinism
7.32s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_graph_mode
7.23s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_attention_outputs
7.16s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_text_generation
7.05s call     tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_attentions
7.04s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_resize_token_embeddings
7.02s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_lm_head_model_random_beam_search_generate
6.90s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_train_pipeline_custom_model
6.88s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_hidden_states_output
6.82s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_save_load
6.73s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_lm_head_model_random_beam_search_generate
6.70s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_model_outputs_equivalence
6.60s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_save_load
6.57s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_keras_save_load
6.48s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_graph_mode
6.47s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_attention_outputs
6.44s call     tests/test_modeling_encoder_decoder.py::GPT2EncoderDecoderModelTest::test_encoder_decoder_model_generate
6.44s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_inputs_embeds
6.35s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_compile_tf_model
6.25s call     tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_compile_tf_model
6.25s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_lm_head_model_random_beam_search_generate
6.11s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_loss_computation
6.05s call     tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_attentions
5.98s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_save_load
5.93s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_train_pipeline_custom_model
5.77s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_compile_tf_model
5.72s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_attentions
5.69s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_hidden_state
5.65s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_fast_only_inputs
5.64s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_multigpu_data_parallel_forward
5.60s call     tests/test_modeling_blenderbot.py::Blenderbot90MIntegrationTests::test_90_generation_from_short_input
5.58s call     tests/test_modeling_electra.py::ElectraModelTest::test_model_outputs_equivalence
5.57s call     tests/test_modeling_rag.py::RagDPRBartTest::test_model_with_encoder_outputs
5.56s call     tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_attentions
5.54s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_attention_outputs
5.54s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_padding
5.51s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_attentions
5.46s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_alignement_methods
5.46s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_offsets_mapping
5.44s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_add_special_tokens
5.40s call     tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_add_tokens
5.33s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_keras_save_load
5.30s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_hidden_state
5.27s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_graph_mode
5.22s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_hidden_states_output
5.20s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_graph_mode
5.12s call     tests/test_pipelines.py::NerPipelineTests::test_tf_only_ner
5.11s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript_output_hidden_state
5.10s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript_output_attentions
5.09s call     tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_hidden_state
5.07s call     tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_graph_mode
5.05s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_pair_input
5.04s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_graph_mode
5.01s call     tests/test_modeling_distilbert.py::DistilBertModelTest::test_torchscript_output_hidden_state
4.99s call     tests/test_modeling_fsmt.py::FSMTHeadTests::test_generate_fp16
4.94s call     tests/test_modeling_distilbert.py::DistilBertModelTest::test_model_outputs_equivalence
4.93s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_hidden_states_output
4.90s call     tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_hidden_state
4.85s call     tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript
4.84s call     tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_train_pipeline_custom_model
4.82s call     tests/test_pipelines.py::QAPipelineTests::test_tf_question_answering
4.81s call     tests/test_modeling_encoder_decoder.py::BertEncoderDecoderModelTest::test_save_and_load_from_encoder_decoder_pretrained
4.78s call     tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_hidden_state
4.76s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_single_input
4.73s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_pretokenized_inputs
4.72s call     tests/test_pipelines.py::QAPipelineTests::test_torch_question_answering
4.68s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_pt_tf_model_equivalence
4.63s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_add_special_tokens
4.60s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_save_load
4.59s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_model_outputs_equivalence
4.59s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_non_existent
4.57s call     tests/test_benchmark_tf.py::TFBenchmarkTest::test_inference_encoder_decoder_with_configs
4.56s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_hidden_states_output
4.51s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_text2text
4.48s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_keras_save_load
4.48s call     tests/test_tokenization_marian.py::MarianTokenizationTest::test_tokenizer_equivalence_en_de
4.47s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_keras_save_load
4.44s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_attention_outputs
4.43s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_hidden_states_output
4.41s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_keyword_and_dict_args
4.39s call     tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_pipeline
4.34s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_save_load
4.33s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_hidden_states_output
4.28s call     tests/test_tokenization_fsmt.py::FSMTTokenizationTest::test_pickle_tokenizer
4.28s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_text_generation
4.27s call     tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_attentions
4.16s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_resize_token_embeddings
4.12s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_compile_tf_model
4.10s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_keras_save_load
4.09s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_train_pipeline_custom_model
4.09s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_save_load
4.08s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_head_pruning_integration
4.07s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_keras_save_load
4.06s call     tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_attentions
4.04s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_fill_mask
3.96s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_determinism
3.96s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_pt_tf_model_equivalence
3.95s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_keras_save_load
3.95s setup    tests/test_modeling_marian.py::TestMarian_FR_EN::test_batch_generation_fr_en
3.93s call     tests/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_model_outputs_equivalence
3.88s call     tests/test_pipelines.py::NerPipelineTests::test_ner_grouped
3.87s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_determinism
3.86s call     tests/test_pipelines.py::NerPipelineTests::test_torch_ner
3.86s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_feature_extraction
3.85s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_sentiment_analysis
3.82s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript
3.75s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_loss_computation
3.75s call     tests/test_modeling_t5.py::T5ModelTest::test_export_to_onnx
3.73s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_lm_head_model_random_beam_search_generate
3.72s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_keras_save_load
3.70s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_fill_mask_with_targets
3.69s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_hidden_states_output
3.64s call     tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_hidden_state
3.62s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_fill_mask_with_targets
3.60s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_feature_extraction
3.60s call     tests/test_pipelines.py::NerPipelineTests::test_tf_ner
3.60s call     tests/test_pipelines.py::ZeroShotClassificationPipelineTests::test_torch_zero_shot_classification
3.59s call     tests/test_modeling_bert.py::BertModelTest::test_torchscript
3.58s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_sentiment_analysis
3.57s setup    tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_auto_config
3.53s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_resize_token_embeddings
3.50s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_fill_mask
3.50s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_keras_save_load
3.50s call     tests/test_modeling_dpr.py::DPRModelTest::test_torchscript
3.46s call     tests/test_modeling_bart.py::BARTModelTest::test_tiny_model
3.46s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_inputs_embeds
3.44s call     tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_hidden_state
3.42s call     tests/test_modeling_xlnet.py::XLNetModelTest::test_model_outputs_equivalence
3.42s call     tests/test_pipelines.py::ZeroShotClassificationPipelineTests::test_tf_zero_shot_classification
3.42s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_pt_tf_model_equivalence
3.40s setup    tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_batch_generation_en_ROMANCE_multi
3.40s call     tests/test_tokenization_albert.py::AlbertTokenizationTest::test_maximum_encoding_length_pair_input
3.40s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_model_outputs_equivalence
3.39s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_keyword_and_dict_args
3.36s call     tests/test_modeling_dpr.py::DPRModelTest::test_model_outputs_equivalence
3.35s setup    tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_batch_generation_en_de
3.33s call     tests/test_modeling_bart.py::BARTModelTest::test_model_outputs_equivalence
3.32s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_determinism
3.32s call     tests/test_pipelines.py::NerPipelineTests::test_tf_ner_grouped
3.31s setup    tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_tokenizer_handles_empty
3.30s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_resize_token_embeddings
3.29s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_pt_tf_model_equivalence
3.28s call     tests/test_modeling_bert.py::BertModelTest::test_head_pruning_integration
3.28s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_model_outputs_equivalence
3.27s call     tests/test_modeling_dpr.py::DPRModelTest::test_torchscript_output_attentions
3.26s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_model_outputs_equivalence
3.23s setup    tests/test_modeling_marian.py::TestMarian_en_zh::test_batch_generation_eng_zho
3.23s setup    tests/test_modeling_marian.py::TestMarian_EN_FR::test_batch_generation_en_fr
3.22s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript
3.20s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_attention_outputs
3.20s setup    tests/test_modeling_marian.py::TestMarian_RU_FR::test_batch_generation_ru_fr
3.19s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_lm_head_model_random_no_beam_search_generate
3.18s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_keras_save_load
3.17s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_pt_tf_model_equivalence
3.15s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_graph_mode
3.13s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_pt_tf_model_equivalence
3.13s setup    tests/test_modeling_marian.py::TestMarian_MT_EN::test_batch_generation_mt_en
3.12s call     tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_compile_tf_model
3.12s setup    tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_forward
3.11s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_model_outputs_equivalence
3.10s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_keyword_and_dict_args

I made a 3-sec cut-off for this listing.

@sshleifer, @sgugger, @LysandreJik, @thomwolf

Source

stas00

Most helpful comment

I would change all things that need to do a training to all thing that need to do a real training.
I spent a lot of time making a mock training fast for the tests of the Trainer and I don't want that marked as slow ;-)

sgugger on 19 Oct 2020

❤1 👍1

All 21 comments

Please note that this is runtime on my machine and not CIs - so make sure you're evaluating the report relative to itself and not CI. Once the PR is merged we will start getting CI reports.

Total run time 4248s.

So it looks like on the torch-side test_tokenization_fast.py accounts for the main culprit adding up to 1300 secs. ~1/3rd of all test run.

And the bulk of slowdown is tf tests.

|time| tests|
|-------|---------------------------------|
|1300 | test_tokenization_fast.py|
|1089 | the rest of torch tests|
|1859 | tf tests|
|-------------|-----|
|4248 | Total|

Another thing I noticed tests/test_modeling_marian.py spends almost 40 secs in setup (11 x 3.3secs) - that's very slow:

3.95s setup    tests/test_modeling_marian.py::TestMarian_FR_EN::test_batch_generation_fr_en
3.57s setup    tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_auto_config
3.40s setup    tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_batch_generation_en_ROMANCE_multi
3.35s setup    tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_batch_generation_en_de
3.31s setup    tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_tokenizer_handles_empty
3.23s setup    tests/test_modeling_marian.py::TestMarian_en_zh::test_batch_generation_eng_zho
3.23s setup    tests/test_modeling_marian.py::TestMarian_EN_FR::test_batch_generation_en_fr
3.20s setup    tests/test_modeling_marian.py::TestMarian_RU_FR::test_batch_generation_ru_fr
3.13s setup    tests/test_modeling_marian.py::TestMarian_MT_EN::test_batch_generation_mt_en
3.12s setup    tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_forward
3.06s setup    tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_pipeline

stas00 on 18 Oct 2020

I fixed marian in https://github.com/huggingface/transformers/pull/7888. Do you want to try to fix test_tokenization_fast or let somebody else?

sshleifer on 18 Oct 2020

Do you want to try to fix test_tokenization_fast

I will give it a go.

edit: Except it was just removed by the merge that just happened. So I have to start from scratch.

stas00 on 18 Oct 2020

❤1

https://github.com/huggingface/transformers/pull/7659 may have fixed the slow tokenization tests. Checking the most recent run it's back to ~2min for the torch-only job.
https://app.circleci.com/pipelines/github/huggingface/transformers/13951/workflows/244749ce-d1ee-488f-a59d-d891fbc38ed6/jobs/100800
I will check a few more and close it if that was the culprit.

stas00 on 19 Oct 2020

👍1

this test for some reason has @slow commented out - it takes 20+ seconds - can we put it back on?
This is not a test that tests functionality that is going to change much, so should be safe to turn it off for normal CIs.
https://github.com/huggingface/transformers/blob/master/tests/test_tokenization_auto.py#L42

pytest  --durations=0 tests/test_tokenization_auto.py &
22.15s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained
4.42s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_non_existent
2.75s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_model_type
2.57s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_tokenizer_class
2.36s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained_identifier
2.06s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_from_pretrained_use_fast_toggle
2.05s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_with_correct_config

stas00 on 19 Oct 2020

This one should probably also be @slow - all the other tests around it are @slow:

15.51s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_translation

stas00 on 19 Oct 2020

Wrote a one liner to calculate the sub-totals for whatever pattern in the output of pytest --durations=0 stats, as in:

22.15s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained
4.42s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_non_existent
2.75s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_model_type
2.57s call     tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_tokenizer_class

Total runtime:

$ cat stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
3308

Total tf runtime:

grep _tf_ stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
1609

stas00 on 19 Oct 2020

🚀2

It this common test a good candidate for @slow?

grep test_model_outputs_equivalence stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
230

At least a few of them are quite slow:

20.11s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_model_outputs_equivalence
16.19s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_model_outputs_equivalence
13.49s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_model_outputs_equivalence
9.94s call     tests/test_modeling_bert.py::BertModelTest::test_model_outputs_equivalence
9.56s call     tests/test_modeling_albert.py::AlbertModelTest::test_model_outputs_equivalence
8.81s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_model_outputs_equivalence
8.29s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_model_outputs_equivalence
7.98s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_model_outputs_equivalence
7.87s call     tests/test_modeling_xlnet.py::XLNetModelTest::test_model_outputs_equivalence
6.85s call     tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_model_outputs_equivalence
6.81s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_model_outputs_equivalence
6.30s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_model_outputs_equivalence
6.25s call     tests/test_modeling_roberta.py::RobertaModelTest::test_model_outputs_equivalence
5.90s call     tests/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_model_outputs_equivalence
5.81s call     tests/test_modeling_electra.py::ElectraModelTest::test_model_outputs_equivalence
5.79s call     tests/test_modeling_distilbert.py::DistilBertModelTest::test_model_outputs_equivalence
5.69s call     tests/test_modeling_xlm.py::XLMModelTest::test_model_outputs_equivalence
5.35s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_model_outputs_equivalence
4.64s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_model_outputs_equivalence
4.34s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_model_outputs_equivalence
3.79s call     tests/test_modeling_dpr.py::DPRModelTest::test_model_outputs_equivalence
3.71s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_model_outputs_equivalence
3.61s call     tests/test_modeling_bart.py::BARTModelTest::test_model_outputs_equivalence
3.58s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_model_outputs_equivalence
3.57s call     tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_model_outputs_equivalence
3.53s call     tests/test_modeling_ctrl.py::CTRLModelTest::test_model_outputs_equivalence
3.40s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_model_outputs_equivalence
3.31s call     tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_model_outputs_equivalence
3.19s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_model_outputs_equivalence
3.12s call     tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_model_outputs_equivalence
2.98s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_model_outputs_equivalence
2.93s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_model_outputs_equivalence
2.80s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_model_outputs_equivalence
2.59s call     tests/test_modeling_longformer.py::LongformerModelTest::test_model_outputs_equivalence
2.37s call     tests/test_modeling_transfo_xl.py::TransfoXLModelTest::test_model_outputs_equivalence
2.17s call     tests/test_modeling_funnel.py::FunnelModelTest::test_model_outputs_equivalence
2.13s call     tests/test_modeling_fsmt.py::FSMTModelTest::test_model_outputs_equivalence
2.02s call     tests/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_model_outputs_equivalence
1.94s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_model_outputs_equivalence
1.70s call     tests/test_modeling_t5.py::T5ModelTest::test_model_outputs_equivalence
1.60s call     tests/test_modeling_deberta.py::DebertaModelTest::test_model_outputs_equivalence
1.44s call     tests/test_modeling_lxmert.py::LxmertModelTest::test_model_outputs_equivalence
1.22s call     tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_model_outputs_equivalence
1.11s call     tests/test_modeling_reformer.py::ReformerLSHAttnModelTest::test_model_outputs_equivalence
0.86s call     tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_model_outputs_equivalence
0.33s call     tests/test_modeling_blenderbot.py::BlenderbotTesterMixin::test_model_outputs_equivalence

stas00 on 19 Oct 2020

Here is another possible candidate for @slow:

grep test_torchscript stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
289

18.89s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_attentions
18.65s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript
11.37s call     tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_hidden_state
11.02s call     tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_attentions
9.69s call     tests/test_modeling_electra.py::ElectraModelTest::test_torchscript
8.90s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_hidden_state
8.35s call     tests/test_modeling_albert.py::AlbertModelTest::test_torchscript
7.82s call     tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript
7.78s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript_output_attentions
7.71s call     tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_hidden_state
7.71s call     tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_attentions
7.68s call     tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_attentions
7.65s call     tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript
7.37s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript
7.12s call     tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_attentions
7.01s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_hidden_state
6.61s call     tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_hidden_state
6.51s call     tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_attentions
5.48s call     tests/test_modeling_distilbert.py::DistilBertModelTest::test_torchscript_output_hidden_state
5.06s call     tests/test_modeling_xlm.py::XLMModelTest::test_torchscript
4.78s call     tests/test_modeling_xlnet.py::XLNetModelTest::test_torchscript_output_attentions
4.71s call     tests/test_modeling_xlm.py::XLMModelTest::test_torchscript_output_attentions
4.71s call     tests/test_modeling_xlnet.py::XLNetModelTest::test_torchscript_output_hidden_state
4.66s call     tests/test_modeling_xlm.py::XLMModelTest::test_torchscript_output_hidden_state
4.59s call     tests/test_modeling_xlnet.py::XLNetModelTest::test_torchscript
4.44s call     tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_attentions
4.32s call     tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_attentions
3.98s call     tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_hidden_state
3.73s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_hidden_state
3.61s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript_output_attentions
3.55s call     tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_hidden_state
3.55s call     tests/test_modeling_dpr.py::DPRModelTest::test_torchscript
3.54s call     tests/test_modeling_bert.py::BertModelTest::test_torchscript
3.50s call     tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_hidden_state
3.46s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript
3.36s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript_output_hidden_state
3.33s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript
3.31s call     tests/test_modeling_dpr.py::DPRModelTest::test_torchscript_output_attentions
3.27s call     tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_attentions
3.25s call     tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_hidden_state
3.07s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_attentions
2.58s call     tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript_output_hidden_state
2.56s call     tests/test_modeling_t5.py::T5ModelTest::test_torchscript_output_hidden_state
2.50s call     tests/test_modeling_fsmt.py::FSMTModelTest::test_torchscript_output_attentions
2.30s call     tests/test_modeling_t5.py::T5ModelTest::test_torchscript_output_attentions
2.28s call     tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript_output_attentions
2.19s call     tests/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_torchscript_output_hidden_state
2.13s call     tests/test_modeling_distilbert.py::DistilBertModelTest::test_torchscript_output_attentions
2.02s call     tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript_output_hidden_state
1.87s call     tests/test_modeling_fsmt.py::FSMTModelTest::test_torchscript_output_hidden_state
1.82s call     tests/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_torchscript_output_attentions
1.78s call     tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript
1.69s call     tests/test_modeling_bart.py::BARTModelTest::test_torchscript
1.51s call     tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript
1.34s call     tests/test_modeling_dpr.py::DPRModelTest::test_torchscript_output_hidden_state
0.89s call     tests/test_modeling_fsmt.py::FSMTModelTest::test_torchscript
0.79s call     tests/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_torchscript
0.05s call     tests/test_modeling_lxmert.py::LxmertModelTest::test_torchscript_output_attentions
0.05s call     tests/test_modeling_lxmert.py::LxmertModelTest::test_torchscript_output_hidden_state
0.01s call     tests/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_torchscript_output_attentions
0.01s call     tests/test_modeling_ctrl.py::CTRLModelTest::test_torchscript_output_attentions
0.01s call     tests/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_torchscript_output_hidden_state

stas00 on 19 Oct 2020

I am fine with marking any of these @slow , but don't care as much now that the issue is resolved.

I do think we should have a repo-wide rule about what should be slow, and you have to write a comment if you want to override it.

Proposed Rule for testing.rst

All tests that take longer than 5s should be marked slow, that way we can save a lot of back and forth.
For common tests to be marked slow, the slowest iteration of that common test must be > 15s.

5s/15s was arbitrary, don't care much what the value is. WDYT @LysandreJik @sgugger ?

sshleifer on 19 Oct 2020

That's a fabulous suggestion!

A few caveats for testing.rst:

< 5s should include model download overhead - my numbers above exclude this, so once we get the data from CI it'll be the true measurement.
5s as measured on CI, since otherwise each hardware is different

stas00 on 19 Oct 2020

While we are at it - there is a ton of very slow tf tests - I suppose the same rule applies there, right?

stas00 on 19 Oct 2020

As much as I love moving quickly, we need to wait for others to agree to the rule before we apply it.
My proposed rule does not differentiate between tf and torch.

sshleifer on 19 Oct 2020

My communication wasn't clear - I meant to do that after we agreed on the threshold and regardless this PR https://github.com/huggingface/transformers/pull/7884 needs to be merged first to perform the correct measurements.

I just saw that there was a lot of tf tests that were very slow and which were not marked as such, so I thought perhaps there was a special reason for them not to be @slow.

stas00 on 19 Oct 2020

I'm not sure setting up a 5/15 or any specific time requirement on tests to classify them as slow would be best. Some tests, like the test_model_outputs_equivalence are important, and running them on contributors' PR when their changes affect the modeling internals is too.

I think the following proposition would be more suited:

if the test is focused on one of the library's internal components (e.g., modeling files, tokenization files, pipelines), then we should run that test in the non-slow test suite. If it's focused on an other aspect of the library, such as the documentation, the examples, then we should run these tests in the slow test suite. And then, to refine this approach we should have exceptions:

All tests that need a specific set of weights (e.g., model or tokenizer integration tests, pipeline integration tests) should be set to slow.
All tests that need to do a training (e.g, trainer integration tests) should be set to slow.
We can introduce exceptions if some of these should-be-non-slow tests are excruciatingly long, and set them to slow. Some examples are some auto modeling tests, which save and load large files to disk, which are set to slow.
Others?

To that end, we should aim for all the non-slow tests to cover entirely the different internals, while making sure that the tests keep a fast execution time. Having some very small models in the tests (e.g, 2 layers, 10 vocab size, etc.) helps in that regard, as does having dummy sets of weights like the sshleifer/tiny-xxx-random weights. On that front, there seems to be something fishy going on with the MobileBERT model, as it's supposed to be an efficient model but takes a while to be tested. There's probably something to do for this model.

Willing to iterate on this wording, or specify/change some aspects if you think of something better.

Following this approach:

For the tokenization_auto tests, we can definitely uncomment the @slow.

For the MonoColumnInputTestCase, we can also set it as a slow test.

LysandreJik on 19 Oct 2020

❤1

sgugger on 19 Oct 2020

❤1 👍1

OK, sounds like @stas00 can mark a few at slow.
Longformer test also uses 5 layers for some reason, not sure if that matters.

sshleifer on 19 Oct 2020

@LysandreJik, just a clarification - so you propose not to have a fixed speed threshold in any of the "clauses". i.e. for non-essential tests as defined by you they should be marked as slow regardless of their speed, correct? I suppose this is smart since even very fast tests still add up to a lot since there could be many of them.

stas00 on 19 Oct 2020

OK, so here is the full non-slow run's report on CI:
https://pastebin.com/8pkaZKjH (quoted from this report)

The top slow ones are (cut off at 10sec):

131.69s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_train_pipeline_custom_model
101.16s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_graph_mode
79.24s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_compile_tf_model
40.68s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_keras_save_load
38.58s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_compile_tf_model
35.05s call     tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_pipeline
32.99s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_train_pipeline_custom_model
27.40s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_graph_mode
26.53s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_compile_tf_model
26.17s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_graph_mode
25.97s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_model_outputs_equivalence
20.57s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_train_pipeline_custom_model
18.71s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_pt_tf_model_equivalence
18.59s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_attentions
17.73s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_translation
17.72s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_train_pipeline_custom_model
17.27s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript
17.15s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_compile_tf_model
16.72s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_hidden_state
16.49s call     tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_train_pipeline_custom_model
16.20s call     tests/test_benchmark_tf.py::TFBenchmarkTest::test_train_no_configs
15.91s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_pt_tf_model_equivalence
15.64s call     tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_compile_tf_model
15.52s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_compile_tf_model
15.36s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_pretokenized_inputs
15.32s call     tests/test_benchmark_tf.py::TFBenchmarkTest::test_train_with_configs
15.28s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_compile_tf_model
15.28s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_model_outputs_equivalence
15.24s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_pt_tf_model_equivalence
15.14s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_pair_input
14.94s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_train_pipeline_custom_model
14.62s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_compile_tf_model
14.34s call     tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_graph_mode
14.32s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_model_outputs_equivalence
14.12s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_add_special_tokens
13.75s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_compile_tf_model
13.72s call     tests/test_modeling_tf_t5.py::TFT5ModelTest::test_compile_tf_model
13.67s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_graph_mode
13.33s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_single_input
13.12s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_attention_outputs
11.84s call     tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_save_load
11.78s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_attention_outputs
11.69s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_compile_tf_model
11.56s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_attention_outputs
11.54s call     tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_train_pipeline_custom_model
11.41s call     tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_train_pipeline_custom_model
11.40s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_train_pipeline_custom_model
11.35s call     tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_train_pipeline_custom_model
11.30s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_train_pipeline_custom_model
10.82s call     tests/test_modeling_tf_electra.py::TFElectraModelTest::test_graph_mode
10.77s call     tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_save_load
10.74s call     tests/test_modeling_tf_bert.py::TFBertModelTest::test_train_pipeline_custom_model
10.71s call     tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_graph_mode
10.60s call     tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_compile_tf_model
10.57s call     tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_save_load
10.55s call     tests/test_modeling_blenderbot.py::Blenderbot90MIntegrationTests::test_90_generation_from_short_input
10.39s call     tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_graph_mode
10.24s call     tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_graph_mode
10.08s call     tests/test_benchmark_tf.py::TFBenchmarkTest::test_inference_encoder_decoder_with_configs

@sshleifer, here is a highlight for you:

35.05s call     tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_pipeline

Other slow torch tests by group:

18.59s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_attentions
17.27s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript
16.72s call     tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_hidden_state

17.73s call     tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_translation

15.36s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_pretokenized_inputs
15.14s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_pair_input
14.12s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_add_special_tokens
13.33s call     tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_single_input

10.55s call     tests/test_modeling_blenderbot.py::Blenderbot90MIntegrationTests::test_90_generation_from_short_input

stas00 on 19 Oct 2020

@LysandreJik, just a clarification - so you propose not to have a fixed speed threshold in any of the "clauses". i.e. for non-essential tests as defined by you they should be marked as slow regardless of their speed, correct? I suppose this is smart since even very fast tests still add up to a lot since there could be many of them.

Yes, I think that would be best! I don't think there are many non-essential tests that are not slow though. We'd still like to get full coverage of the library's internals using only non-@slow tests, so getting these tests below a certain time threshold would still be important so that every PR could get quick feedback on the CI's status.

LysandreJik on 20 Oct 2020

Thank you for that clarification, @LysandreJik

Please have a look at how your suggestions have been integrated into the testing doc:
https://github.com/huggingface/transformers/pull/7895/files

stas00 on 21 Oct 2020

Was this page helpful?

0 / 5 - 0 ratings