We are going to have a CI-side running reports when this is merged https://github.com/huggingface/transformers/pull/7884, but we can already start looking at what caused a 4-5 times slowdown in the test suite about 10 days ago. I'm not sure the exact moment, but I checked a few reports and it appears that the change happened around Oct 8th +/- a few days.
e.g. before:
https://app.circleci.com/pipelines/github/huggingface/transformers/13323/workflows/5984ea0e-e280-4a41-bc4a-b4a3d72fc411/jobs/95699
after:
https://app.circleci.com/pipelines/github/huggingface/transformers/13521/workflows/d235c864-66fa-4408-a787-2efab850a781/jobs/97329
@sshleifer suggested a diagnostic to resolve this by adding a pytorch --durations=N flag, except if it's a missing @slow it won't work on my machine because I already have all the models pre-downloaded, so the following is just the slow execution:
Here is the report on my machine running all tests normally
$ pytest -n 3 --durations=0 tests
[...]
76.92s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_train_pipeline_custom_model
54.38s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_graph_mode
49.85s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_compile_tf_model
48.98s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_save_pretrained
44.11s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_compile_tf_model
38.42s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_graph_mode
35.94s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_tokenization_python_rust_equals
35.86s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_create_token_type_ids
35.81s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_embeded_special_tokens
35.58s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_max_length_equal
35.54s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_padding
35.36s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_is_fast
35.14s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_pretokenized_inputs
35.10s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_special_tokens_map_equal
35.07s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_num_special_tokens_to_add_equal
35.02s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_build_inputs_with_special_tokens
34.94s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_prepare_for_model
31.60s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_compile_tf_model
31.03s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_train_pipeline_custom_model
29.11s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_compile_tf_model
29.10s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_train_pipeline_custom_model
27.62s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_pt_tf_model_equivalence
26.36s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_compile_tf_model
25.12s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_save_pretrained
24.85s call tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_graph_mode
24.66s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_compile_tf_model
24.04s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_tokenization_python_rust_equals
23.15s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained
23.10s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_graph_mode
23.08s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_padding
22.99s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_compile_tf_model
22.78s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_train_pipeline_custom_model
22.69s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_keras_save_load
22.67s call tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_train_pipeline_custom_model
22.43s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_pretokenized_inputs
22.38s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_create_token_type_ids
22.35s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_prepare_for_model
22.28s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_attentions
22.25s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_max_length_equal
22.19s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_embeded_special_tokens
22.06s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_special_tokens_map_equal
21.95s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_is_fast
21.92s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_build_inputs_with_special_tokens
21.85s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_num_special_tokens_to_add_equal
21.61s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_graph_mode
21.49s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_add_special_tokens
21.32s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_add_tokens
21.21s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_alignement_methods
21.09s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_batch_encode_dynamic_overflowing
21.06s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_fast_only_inputs
20.95s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript
20.86s call tests/test_tokenization_fast.py::SentencePieceFastTokenizerTest::test_offsets_mapping
20.06s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_model_outputs_equivalence
20.01s call tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_train_pipeline_custom_model
19.62s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_attention_outputs
19.39s call tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript_output_attentions
18.78s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_save_pretrained
18.63s call tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_train_pipeline_custom_model
18.36s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_compile_tf_model
18.08s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_tokenization_python_rust_equals
17.85s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_save_load
17.54s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_train_pipeline_custom_model
17.39s call tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_train_pipeline_custom_model
17.28s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_embeded_special_tokens
17.25s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_special_tokens_map_equal
16.88s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_train_pipeline_custom_model
16.84s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_graph_mode
16.74s call tests/test_modeling_electra.py::ElectraModelTest::test_torchscript
16.73s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_padding
16.63s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_max_length_equal
16.56s call tests/test_modeling_fsmt.py::FSMTModelTest::test_lm_head_model_random_beam_search_generate
16.55s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_hidden_state
16.53s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_build_inputs_with_special_tokens
16.53s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_is_fast
16.49s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_create_token_type_ids
16.45s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_num_special_tokens_to_add_equal
16.43s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_graph_mode
16.42s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_prepare_for_model
16.09s call tests/test_tokenization_albert.py::AlbertTokenizationTest::test_pretokenized_inputs
16.02s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_model_outputs_equivalence
15.80s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_train_pipeline_custom_model
15.50s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_translation
15.30s call tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_train_pipeline_custom_model
15.00s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_graph_mode
14.96s call tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_compile_tf_model
14.07s call tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_train_pipeline_custom_model
14.03s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_pt_tf_model_equivalence
13.77s call tests/test_modeling_rag.py::RagDPRT5Test::test_model_generate
13.29s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_hidden_states_output
13.10s call tests/test_modeling_gpt2.py::GPT2ModelTest::test_model_outputs_equivalence
12.69s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_model_outputs_equivalence
12.43s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_save_pretrained
11.76s call tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_compile_tf_model
11.73s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_graph_mode
11.66s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_pt_tf_model_equivalence
11.63s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_special_tokens_map_equal
11.60s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_batch_encode_dynamic_overflowing
11.51s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_add_special_tokens
11.50s call tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_compile_tf_model
11.36s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_prepare_for_model
11.34s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_add_tokens
11.23s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_tokenization_python_rust_equals
11.19s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_fast_only_inputs
11.17s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_offsets_mapping
11.09s call tests/test_benchmark_tf.py::TFBenchmarkTest::test_inference_no_configs_xla
11.05s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_alignement_methods
11.04s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_is_fast
10.95s call tests/test_tokenization_fast.py::WordPieceFastTokenizerTest::test_offsets_with_special_characters
10.81s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_embeded_special_tokens
10.71s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_max_length_equal
10.59s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_build_inputs_with_special_tokens
10.59s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_num_special_tokens_to_add_equal
10.56s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_pt_tf_model_equivalence
10.42s call tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_hidden_state
10.39s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_create_token_type_ids
10.36s call tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_compile_tf_model
10.34s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_attention_outputs
10.31s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_model_outputs_equivalence
10.25s call tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript
10.15s call tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_attentions
9.78s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_pt_tf_model_equivalence
9.76s call tests/test_benchmark_tf.py::TFBenchmarkTest::test_train_with_configs
9.76s call tests/test_modeling_bert.py::BertModelTest::test_model_outputs_equivalence
9.72s call tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript
9.50s call tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_hidden_state
9.37s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_pt_tf_model_equivalence
9.31s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_batch_encode_dynamic_overflowing
9.31s call tests/test_modeling_albert.py::AlbertModelTest::test_model_outputs_equivalence
9.30s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_save_load
9.10s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_model_outputs_equivalence
9.01s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_keras_save_load
8.88s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_add_tokens
8.81s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_offsets_mapping
8.80s call tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_keras_save_load
8.73s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_add_special_tokens
8.66s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_fast_only_inputs
8.60s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_pt_tf_model_equivalence
8.59s call tests/test_tokenization_fast.py::RobertaFastTokenizerTest::test_alignement_methods
8.57s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_model_outputs_equivalence
8.50s call tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_compile_tf_model
8.34s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_summarization
8.33s call tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_lm_head_model_random_beam_search_generate
8.17s call tests/test_modeling_albert.py::AlbertModelTest::test_torchscript
8.01s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_model_outputs_equivalence
7.96s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_model_outputs_equivalence
7.88s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_attention_outputs
7.87s call tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_graph_mode
7.84s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_attention_outputs
7.84s call tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript
7.58s call tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_attentions
7.57s call tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_train_pipeline_custom_model
7.51s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_save_load
7.47s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_attention_outputs
7.46s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_keyword_and_dict_args
7.36s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_save_load
7.36s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_determinism
7.32s call tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_graph_mode
7.23s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_attention_outputs
7.16s call tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_text_generation
7.05s call tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_attentions
7.04s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_resize_token_embeddings
7.02s call tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_lm_head_model_random_beam_search_generate
6.90s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_train_pipeline_custom_model
6.88s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_hidden_states_output
6.82s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_save_load
6.73s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_lm_head_model_random_beam_search_generate
6.70s call tests/test_modeling_flaubert.py::FlaubertModelTest::test_model_outputs_equivalence
6.60s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_save_load
6.57s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_keras_save_load
6.48s call tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_graph_mode
6.47s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_attention_outputs
6.44s call tests/test_modeling_encoder_decoder.py::GPT2EncoderDecoderModelTest::test_encoder_decoder_model_generate
6.44s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_inputs_embeds
6.35s call tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_compile_tf_model
6.25s call tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_compile_tf_model
6.25s call tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_lm_head_model_random_beam_search_generate
6.11s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_loss_computation
6.05s call tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_attentions
5.98s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_save_load
5.93s call tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_train_pipeline_custom_model
5.77s call tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_compile_tf_model
5.72s call tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_attentions
5.69s call tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_hidden_state
5.65s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_fast_only_inputs
5.64s call tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_multigpu_data_parallel_forward
5.60s call tests/test_modeling_blenderbot.py::Blenderbot90MIntegrationTests::test_90_generation_from_short_input
5.58s call tests/test_modeling_electra.py::ElectraModelTest::test_model_outputs_equivalence
5.57s call tests/test_modeling_rag.py::RagDPRBartTest::test_model_with_encoder_outputs
5.56s call tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_attentions
5.54s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_attention_outputs
5.54s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_padding
5.51s call tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_attentions
5.46s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_alignement_methods
5.46s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_offsets_mapping
5.44s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_add_special_tokens
5.40s call tests/test_tokenization_fast.py::NoPaddingTokenFastTokenizerMatchingTest::test_add_tokens
5.33s call tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_keras_save_load
5.30s call tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_hidden_state
5.27s call tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_graph_mode
5.22s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_hidden_states_output
5.20s call tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_graph_mode
5.12s call tests/test_pipelines.py::NerPipelineTests::test_tf_only_ner
5.11s call tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript_output_hidden_state
5.10s call tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript_output_attentions
5.09s call tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_hidden_state
5.07s call tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_graph_mode
5.05s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_pair_input
5.04s call tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_graph_mode
5.01s call tests/test_modeling_distilbert.py::DistilBertModelTest::test_torchscript_output_hidden_state
4.99s call tests/test_modeling_fsmt.py::FSMTHeadTests::test_generate_fp16
4.94s call tests/test_modeling_distilbert.py::DistilBertModelTest::test_model_outputs_equivalence
4.93s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_hidden_states_output
4.90s call tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_hidden_state
4.85s call tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript
4.84s call tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_train_pipeline_custom_model
4.82s call tests/test_pipelines.py::QAPipelineTests::test_tf_question_answering
4.81s call tests/test_modeling_encoder_decoder.py::BertEncoderDecoderModelTest::test_save_and_load_from_encoder_decoder_pretrained
4.78s call tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_hidden_state
4.76s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_single_input
4.73s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_pretokenized_inputs
4.72s call tests/test_pipelines.py::QAPipelineTests::test_torch_question_answering
4.68s call tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_pt_tf_model_equivalence
4.63s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_add_special_tokens
4.60s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_save_load
4.59s call tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_model_outputs_equivalence
4.59s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_non_existent
4.57s call tests/test_benchmark_tf.py::TFBenchmarkTest::test_inference_encoder_decoder_with_configs
4.56s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_hidden_states_output
4.51s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_text2text
4.48s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_keras_save_load
4.48s call tests/test_tokenization_marian.py::MarianTokenizationTest::test_tokenizer_equivalence_en_de
4.47s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_keras_save_load
4.44s call tests/test_modeling_flaubert.py::FlaubertModelTest::test_attention_outputs
4.43s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_hidden_states_output
4.41s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_keyword_and_dict_args
4.39s call tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_pipeline
4.34s call tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_save_load
4.33s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_hidden_states_output
4.28s call tests/test_tokenization_fsmt.py::FSMTTokenizationTest::test_pickle_tokenizer
4.28s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_text_generation
4.27s call tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_attentions
4.16s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_resize_token_embeddings
4.12s call tests/test_modeling_tf_t5.py::TFT5ModelTest::test_compile_tf_model
4.10s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_keras_save_load
4.09s call tests/test_modeling_tf_t5.py::TFT5ModelTest::test_train_pipeline_custom_model
4.09s call tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_save_load
4.08s call tests/test_modeling_openai.py::OpenAIGPTModelTest::test_head_pruning_integration
4.07s call tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_keras_save_load
4.06s call tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_attentions
4.04s call tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_fill_mask
3.96s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_determinism
3.96s call tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_pt_tf_model_equivalence
3.95s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_keras_save_load
3.95s setup tests/test_modeling_marian.py::TestMarian_FR_EN::test_batch_generation_fr_en
3.93s call tests/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_model_outputs_equivalence
3.88s call tests/test_pipelines.py::NerPipelineTests::test_ner_grouped
3.87s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_determinism
3.86s call tests/test_pipelines.py::NerPipelineTests::test_torch_ner
3.86s call tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_feature_extraction
3.85s call tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_sentiment_analysis
3.82s call tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript
3.75s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_loss_computation
3.75s call tests/test_modeling_t5.py::T5ModelTest::test_export_to_onnx
3.73s call tests/test_modeling_tf_t5.py::TFT5ModelTest::test_lm_head_model_random_beam_search_generate
3.72s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_keras_save_load
3.70s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_fill_mask_with_targets
3.69s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_hidden_states_output
3.64s call tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_hidden_state
3.62s call tests/test_pipelines.py::MonoColumnInputTestCase::test_tf_fill_mask_with_targets
3.60s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_feature_extraction
3.60s call tests/test_pipelines.py::NerPipelineTests::test_tf_ner
3.60s call tests/test_pipelines.py::ZeroShotClassificationPipelineTests::test_torch_zero_shot_classification
3.59s call tests/test_modeling_bert.py::BertModelTest::test_torchscript
3.58s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_sentiment_analysis
3.57s setup tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_auto_config
3.53s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_resize_token_embeddings
3.50s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_fill_mask
3.50s call tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_keras_save_load
3.50s call tests/test_modeling_dpr.py::DPRModelTest::test_torchscript
3.46s call tests/test_modeling_bart.py::BARTModelTest::test_tiny_model
3.46s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_inputs_embeds
3.44s call tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_hidden_state
3.42s call tests/test_modeling_xlnet.py::XLNetModelTest::test_model_outputs_equivalence
3.42s call tests/test_pipelines.py::ZeroShotClassificationPipelineTests::test_tf_zero_shot_classification
3.42s call tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_pt_tf_model_equivalence
3.40s setup tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_batch_generation_en_ROMANCE_multi
3.40s call tests/test_tokenization_albert.py::AlbertTokenizationTest::test_maximum_encoding_length_pair_input
3.40s call tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_model_outputs_equivalence
3.39s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_keyword_and_dict_args
3.36s call tests/test_modeling_dpr.py::DPRModelTest::test_model_outputs_equivalence
3.35s setup tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_batch_generation_en_de
3.33s call tests/test_modeling_bart.py::BARTModelTest::test_model_outputs_equivalence
3.32s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_determinism
3.32s call tests/test_pipelines.py::NerPipelineTests::test_tf_ner_grouped
3.31s setup tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_tokenizer_handles_empty
3.30s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_resize_token_embeddings
3.29s call tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_pt_tf_model_equivalence
3.28s call tests/test_modeling_bert.py::BertModelTest::test_head_pruning_integration
3.28s call tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_model_outputs_equivalence
3.27s call tests/test_modeling_dpr.py::DPRModelTest::test_torchscript_output_attentions
3.26s call tests/test_modeling_openai.py::OpenAIGPTModelTest::test_model_outputs_equivalence
3.23s setup tests/test_modeling_marian.py::TestMarian_en_zh::test_batch_generation_eng_zho
3.23s setup tests/test_modeling_marian.py::TestMarian_EN_FR::test_batch_generation_en_fr
3.22s call tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript
3.20s call tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_attention_outputs
3.20s setup tests/test_modeling_marian.py::TestMarian_RU_FR::test_batch_generation_ru_fr
3.19s call tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_lm_head_model_random_no_beam_search_generate
3.18s call tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_keras_save_load
3.17s call tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_pt_tf_model_equivalence
3.15s call tests/test_modeling_tf_t5.py::TFT5ModelTest::test_graph_mode
3.13s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_pt_tf_model_equivalence
3.13s setup tests/test_modeling_marian.py::TestMarian_MT_EN::test_batch_generation_mt_en
3.12s call tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_compile_tf_model
3.12s setup tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_forward
3.11s call tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_model_outputs_equivalence
3.10s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_keyword_and_dict_args
I made a 3-sec cut-off for this listing.
@sshleifer, @sgugger, @LysandreJik, @thomwolf
Please note that this is runtime on my machine and not CIs - so make sure you're evaluating the report relative to itself and not CI. Once the PR is merged we will start getting CI reports.
Total run time 4248s.
So it looks like on the torch-side test_tokenization_fast.py accounts for the main culprit adding up to 1300 secs. ~1/3rd of all test run.
And the bulk of slowdown is tf tests.
|time| tests|
|-------|---------------------------------|
|1300 | test_tokenization_fast.py|
|1089 | the rest of torch tests|
|1859 | tf tests|
|-------------|-----|
|4248 | Total|
Another thing I noticed tests/test_modeling_marian.py spends almost 40 secs in setup (11 x 3.3secs) - that's very slow:
3.95s setup tests/test_modeling_marian.py::TestMarian_FR_EN::test_batch_generation_fr_en
3.57s setup tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_auto_config
3.40s setup tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_batch_generation_en_ROMANCE_multi
3.35s setup tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_batch_generation_en_de
3.31s setup tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_tokenizer_handles_empty
3.23s setup tests/test_modeling_marian.py::TestMarian_en_zh::test_batch_generation_eng_zho
3.23s setup tests/test_modeling_marian.py::TestMarian_EN_FR::test_batch_generation_en_fr
3.20s setup tests/test_modeling_marian.py::TestMarian_RU_FR::test_batch_generation_ru_fr
3.13s setup tests/test_modeling_marian.py::TestMarian_MT_EN::test_batch_generation_mt_en
3.12s setup tests/test_modeling_marian.py::TestMarian_EN_DE_More::test_forward
3.06s setup tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_pipeline
I fixed marian in https://github.com/huggingface/transformers/pull/7888. Do you want to try to fix test_tokenization_fast or let somebody else?
Do you want to try to fix test_tokenization_fast
I will give it a go.
edit: Except it was just removed by the merge that just happened. So I have to start from scratch.
https://github.com/huggingface/transformers/pull/7659 may have fixed the slow tokenization tests. Checking the most recent run it's back to ~2min for the torch-only job.
https://app.circleci.com/pipelines/github/huggingface/transformers/13951/workflows/244749ce-d1ee-488f-a59d-d891fbc38ed6/jobs/100800
I will check a few more and close it if that was the culprit.
this test for some reason has @slow commented out - it takes 20+ seconds - can we put it back on?
This is not a test that tests functionality that is going to change much, so should be safe to turn it off for normal CIs.
https://github.com/huggingface/transformers/blob/master/tests/test_tokenization_auto.py#L42
pytest --durations=0 tests/test_tokenization_auto.py &
22.15s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained
4.42s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_non_existent
2.75s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_model_type
2.57s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_tokenizer_class
2.36s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained_identifier
2.06s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_from_pretrained_use_fast_toggle
2.05s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_with_correct_config
This one should probably also be @slow - all the other tests around it are @slow:
15.51s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_translation
Wrote a one liner to calculate the sub-totals for whatever pattern in the output of pytest --durations=0 stats, as in:
22.15s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_pretrained
4.42s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_identifier_non_existent
2.75s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_model_type
2.57s call tests/test_tokenization_auto.py::AutoTokenizerTest::test_tokenizer_from_tokenizer_class
Total runtime:
$ cat stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
3308
Total tf runtime:
grep _tf_ stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
1609
It this common test a good candidate for @slow?
grep test_model_outputs_equivalence stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
230
At least a few of them are quite slow:
20.11s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_model_outputs_equivalence
16.19s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_model_outputs_equivalence
13.49s call tests/test_modeling_gpt2.py::GPT2ModelTest::test_model_outputs_equivalence
9.94s call tests/test_modeling_bert.py::BertModelTest::test_model_outputs_equivalence
9.56s call tests/test_modeling_albert.py::AlbertModelTest::test_model_outputs_equivalence
8.81s call tests/test_modeling_flaubert.py::FlaubertModelTest::test_model_outputs_equivalence
8.29s call tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_model_outputs_equivalence
7.98s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_model_outputs_equivalence
7.87s call tests/test_modeling_xlnet.py::XLNetModelTest::test_model_outputs_equivalence
6.85s call tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_model_outputs_equivalence
6.81s call tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_model_outputs_equivalence
6.30s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_model_outputs_equivalence
6.25s call tests/test_modeling_roberta.py::RobertaModelTest::test_model_outputs_equivalence
5.90s call tests/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_model_outputs_equivalence
5.81s call tests/test_modeling_electra.py::ElectraModelTest::test_model_outputs_equivalence
5.79s call tests/test_modeling_distilbert.py::DistilBertModelTest::test_model_outputs_equivalence
5.69s call tests/test_modeling_xlm.py::XLMModelTest::test_model_outputs_equivalence
5.35s call tests/test_modeling_tf_t5.py::TFT5ModelTest::test_model_outputs_equivalence
4.64s call tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_model_outputs_equivalence
4.34s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_model_outputs_equivalence
3.79s call tests/test_modeling_dpr.py::DPRModelTest::test_model_outputs_equivalence
3.71s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_model_outputs_equivalence
3.61s call tests/test_modeling_bart.py::BARTModelTest::test_model_outputs_equivalence
3.58s call tests/test_modeling_openai.py::OpenAIGPTModelTest::test_model_outputs_equivalence
3.57s call tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_model_outputs_equivalence
3.53s call tests/test_modeling_ctrl.py::CTRLModelTest::test_model_outputs_equivalence
3.40s call tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_model_outputs_equivalence
3.31s call tests/test_modeling_tf_gpt2.py::TFGPT2ModelTest::test_model_outputs_equivalence
3.19s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_model_outputs_equivalence
3.12s call tests/test_modeling_tf_openai.py::TFOpenAIGPTModelTest::test_model_outputs_equivalence
2.98s call tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_model_outputs_equivalence
2.93s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_model_outputs_equivalence
2.80s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_model_outputs_equivalence
2.59s call tests/test_modeling_longformer.py::LongformerModelTest::test_model_outputs_equivalence
2.37s call tests/test_modeling_transfo_xl.py::TransfoXLModelTest::test_model_outputs_equivalence
2.17s call tests/test_modeling_funnel.py::FunnelModelTest::test_model_outputs_equivalence
2.13s call tests/test_modeling_fsmt.py::FSMTModelTest::test_model_outputs_equivalence
2.02s call tests/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_model_outputs_equivalence
1.94s call tests/test_modeling_funnel.py::FunnelBaseModelTest::test_model_outputs_equivalence
1.70s call tests/test_modeling_t5.py::T5ModelTest::test_model_outputs_equivalence
1.60s call tests/test_modeling_deberta.py::DebertaModelTest::test_model_outputs_equivalence
1.44s call tests/test_modeling_lxmert.py::LxmertModelTest::test_model_outputs_equivalence
1.22s call tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_model_outputs_equivalence
1.11s call tests/test_modeling_reformer.py::ReformerLSHAttnModelTest::test_model_outputs_equivalence
0.86s call tests/test_modeling_tf_ctrl.py::TFCTRLModelTest::test_model_outputs_equivalence
0.33s call tests/test_modeling_blenderbot.py::BlenderbotTesterMixin::test_model_outputs_equivalence
Here is another possible candidate for @slow:
grep test_torchscript stats.txt | perl -ne 's|^(.*?)s.|$x+=$1|e; END {print int $x}'
289
18.89s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_attentions
18.65s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript
11.37s call tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_hidden_state
11.02s call tests/test_modeling_bert.py::BertModelTest::test_torchscript_output_attentions
9.69s call tests/test_modeling_electra.py::ElectraModelTest::test_torchscript
8.90s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_hidden_state
8.35s call tests/test_modeling_albert.py::AlbertModelTest::test_torchscript
7.82s call tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript
7.78s call tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript_output_attentions
7.71s call tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_hidden_state
7.71s call tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_attentions
7.68s call tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_attentions
7.65s call tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript
7.37s call tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript
7.12s call tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_attentions
7.01s call tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_hidden_state
6.61s call tests/test_modeling_roberta.py::RobertaModelTest::test_torchscript_output_hidden_state
6.51s call tests/test_modeling_squeezebert.py::SqueezeBertModelTest::test_torchscript_output_attentions
5.48s call tests/test_modeling_distilbert.py::DistilBertModelTest::test_torchscript_output_hidden_state
5.06s call tests/test_modeling_xlm.py::XLMModelTest::test_torchscript
4.78s call tests/test_modeling_xlnet.py::XLNetModelTest::test_torchscript_output_attentions
4.71s call tests/test_modeling_xlm.py::XLMModelTest::test_torchscript_output_attentions
4.71s call tests/test_modeling_xlnet.py::XLNetModelTest::test_torchscript_output_hidden_state
4.66s call tests/test_modeling_xlm.py::XLMModelTest::test_torchscript_output_hidden_state
4.59s call tests/test_modeling_xlnet.py::XLNetModelTest::test_torchscript
4.44s call tests/test_modeling_funnel.py::FunnelModelTest::test_torchscript_output_attentions
4.32s call tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_attentions
3.98s call tests/test_modeling_electra.py::ElectraModelTest::test_torchscript_output_hidden_state
3.73s call tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_hidden_state
3.61s call tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript_output_attentions
3.55s call tests/test_modeling_bart.py::BARTModelTest::test_torchscript_output_hidden_state
3.55s call tests/test_modeling_dpr.py::DPRModelTest::test_torchscript
3.54s call tests/test_modeling_bert.py::BertModelTest::test_torchscript
3.50s call tests/test_modeling_albert.py::AlbertModelTest::test_torchscript_output_hidden_state
3.46s call tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript
3.36s call tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript_output_hidden_state
3.33s call tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript
3.31s call tests/test_modeling_dpr.py::DPRModelTest::test_torchscript_output_attentions
3.27s call tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_attentions
3.25s call tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript_output_hidden_state
3.07s call tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript_output_attentions
2.58s call tests/test_modeling_flaubert.py::FlaubertModelTest::test_torchscript_output_hidden_state
2.56s call tests/test_modeling_t5.py::T5ModelTest::test_torchscript_output_hidden_state
2.50s call tests/test_modeling_fsmt.py::FSMTModelTest::test_torchscript_output_attentions
2.30s call tests/test_modeling_t5.py::T5ModelTest::test_torchscript_output_attentions
2.28s call tests/test_modeling_openai.py::OpenAIGPTModelTest::test_torchscript_output_attentions
2.19s call tests/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_torchscript_output_hidden_state
2.13s call tests/test_modeling_distilbert.py::DistilBertModelTest::test_torchscript_output_attentions
2.02s call tests/test_modeling_gpt2.py::GPT2ModelTest::test_torchscript_output_hidden_state
1.87s call tests/test_modeling_fsmt.py::FSMTModelTest::test_torchscript_output_hidden_state
1.82s call tests/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_torchscript_output_attentions
1.78s call tests/test_modeling_funnel.py::FunnelBaseModelTest::test_torchscript
1.69s call tests/test_modeling_bart.py::BARTModelTest::test_torchscript
1.51s call tests/test_modeling_layoutlm.py::LayoutLMModelTest::test_torchscript
1.34s call tests/test_modeling_dpr.py::DPRModelTest::test_torchscript_output_hidden_state
0.89s call tests/test_modeling_fsmt.py::FSMTModelTest::test_torchscript
0.79s call tests/test_modeling_bert_generation.py::BertGenerationEncoderTest::test_torchscript
0.05s call tests/test_modeling_lxmert.py::LxmertModelTest::test_torchscript_output_attentions
0.05s call tests/test_modeling_lxmert.py::LxmertModelTest::test_torchscript_output_hidden_state
0.01s call tests/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_torchscript_output_attentions
0.01s call tests/test_modeling_ctrl.py::CTRLModelTest::test_torchscript_output_attentions
0.01s call tests/test_modeling_reformer.py::ReformerLocalAttnModelTest::test_torchscript_output_hidden_state
I am fine with marking any of these @slow , but don't care as much now that the issue is resolved.
I do think we should have a repo-wide rule about what should be slow, and you have to write a comment if you want to override it.
5s/15s was arbitrary, don't care much what the value is. WDYT @LysandreJik @sgugger ?
That's a fabulous suggestion!
A few caveats for testing.rst:
While we are at it - there is a ton of very slow tf tests - I suppose the same rule applies there, right?
As much as I love moving quickly, we need to wait for others to agree to the rule before we apply it.
My proposed rule does not differentiate between tf and torch.
My communication wasn't clear - I meant to do that after we agreed on the threshold and regardless this PR https://github.com/huggingface/transformers/pull/7884 needs to be merged first to perform the correct measurements.
I just saw that there was a lot of tf tests that were very slow and which were not marked as such, so I thought perhaps there was a special reason for them not to be @slow.
I'm not sure setting up a 5/15 or any specific time requirement on tests to classify them as slow would be best. Some tests, like the test_model_outputs_equivalence are important, and running them on contributors' PR when their changes affect the modeling internals is too.
I think the following proposition would be more suited:
if the test is focused on one of the library's internal components (e.g., modeling files, tokenization files, pipelines), then we should run that test in the non-slow test suite. If it's focused on an other aspect of the library, such as the documentation, the examples, then we should run these tests in the slow test suite. And then, to refine this approach we should have exceptions:
To that end, we should aim for all the non-slow tests to cover entirely the different internals, while making sure that the tests keep a fast execution time. Having some very small models in the tests (e.g, 2 layers, 10 vocab size, etc.) helps in that regard, as does having dummy sets of weights like the sshleifer/tiny-xxx-random weights. On that front, there seems to be something fishy going on with the MobileBERT model, as it's supposed to be an efficient model but takes a while to be tested. There's probably something to do for this model.
Willing to iterate on this wording, or specify/change some aspects if you think of something better.
Following this approach:
For the tokenization_auto tests, we can definitely uncomment the @slow.
For the MonoColumnInputTestCase, we can also set it as a slow test.
I would change all things that need to do a training to all thing that need to do a real training.
I spent a lot of time making a mock training fast for the tests of the Trainer and I don't want that marked as slow ;-)
OK, sounds like @stas00 can mark a few at slow.
Longformer test also uses 5 layers for some reason, not sure if that matters.
@LysandreJik, just a clarification - so you propose not to have a fixed speed threshold in any of the "clauses". i.e. for non-essential tests as defined by you they should be marked as slow regardless of their speed, correct? I suppose this is smart since even very fast tests still add up to a lot since there could be many of them.
OK, so here is the full non-slow run's report on CI:
https://pastebin.com/8pkaZKjH (quoted from this report)
The top slow ones are (cut off at 10sec):
131.69s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_train_pipeline_custom_model
101.16s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_graph_mode
79.24s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_compile_tf_model
40.68s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_keras_save_load
38.58s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_compile_tf_model
35.05s call tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_pipeline
32.99s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_train_pipeline_custom_model
27.40s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_graph_mode
26.53s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_compile_tf_model
26.17s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_graph_mode
25.97s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_model_outputs_equivalence
20.57s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_train_pipeline_custom_model
18.71s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_pt_tf_model_equivalence
18.59s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_attentions
17.73s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_translation
17.72s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_train_pipeline_custom_model
17.27s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript
17.15s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_compile_tf_model
16.72s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_hidden_state
16.49s call tests/test_modeling_tf_funnel.py::TFFunnelBaseModelTest::test_train_pipeline_custom_model
16.20s call tests/test_benchmark_tf.py::TFBenchmarkTest::test_train_no_configs
15.91s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_pt_tf_model_equivalence
15.64s call tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_compile_tf_model
15.52s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_compile_tf_model
15.36s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_pretokenized_inputs
15.32s call tests/test_benchmark_tf.py::TFBenchmarkTest::test_train_with_configs
15.28s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_compile_tf_model
15.28s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_model_outputs_equivalence
15.24s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_pt_tf_model_equivalence
15.14s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_pair_input
14.94s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_train_pipeline_custom_model
14.62s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_compile_tf_model
14.34s call tests/test_modeling_tf_funnel.py::TFFunnelModelTest::test_graph_mode
14.32s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_model_outputs_equivalence
14.12s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_add_special_tokens
13.75s call tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_compile_tf_model
13.72s call tests/test_modeling_tf_t5.py::TFT5ModelTest::test_compile_tf_model
13.67s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_graph_mode
13.33s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_single_input
13.12s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_attention_outputs
11.84s call tests/test_modeling_tf_mobilebert.py::TFMobileBertModelTest::test_save_load
11.78s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_attention_outputs
11.69s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_compile_tf_model
11.56s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_attention_outputs
11.54s call tests/test_modeling_tf_transfo_xl.py::TFTransfoXLModelTest::test_train_pipeline_custom_model
11.41s call tests/test_modeling_tf_lxmert.py::TFLxmertModelTest::test_train_pipeline_custom_model
11.40s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_train_pipeline_custom_model
11.35s call tests/test_modeling_tf_roberta.py::TFRobertaModelTest::test_train_pipeline_custom_model
11.30s call tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_train_pipeline_custom_model
10.82s call tests/test_modeling_tf_electra.py::TFElectraModelTest::test_graph_mode
10.77s call tests/test_modeling_tf_distilbert.py::TFDistilBertModelTest::test_save_load
10.74s call tests/test_modeling_tf_bert.py::TFBertModelTest::test_train_pipeline_custom_model
10.71s call tests/test_modeling_tf_xlm.py::TFXLMModelTest::test_graph_mode
10.60s call tests/test_modeling_tf_albert.py::TFAlbertModelTest::test_compile_tf_model
10.57s call tests/test_modeling_tf_longformer.py::TFLongformerModelTest::test_save_load
10.55s call tests/test_modeling_blenderbot.py::Blenderbot90MIntegrationTests::test_90_generation_from_short_input
10.39s call tests/test_modeling_tf_flaubert.py::TFFlaubertModelTest::test_graph_mode
10.24s call tests/test_modeling_tf_xlnet.py::TFXLNetModelTest::test_graph_mode
10.08s call tests/test_benchmark_tf.py::TFBenchmarkTest::test_inference_encoder_decoder_with_configs
@sshleifer, here is a highlight for you:
35.05s call tests/test_modeling_marian.py::TestMarian_en_ROMANCE::test_pipeline
Other slow torch tests by group:
18.59s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_attentions
17.27s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript
16.72s call tests/test_modeling_mobilebert.py::MobileBertModelTest::test_torchscript_output_hidden_state
17.73s call tests/test_pipelines.py::MonoColumnInputTestCase::test_torch_translation
15.36s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_pretokenized_inputs
15.14s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_pair_input
14.12s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_add_special_tokens
13.33s call tests/test_tokenization_deberta.py::DebertaTokenizationTest::test_maximum_encoding_length_single_input
10.55s call tests/test_modeling_blenderbot.py::Blenderbot90MIntegrationTests::test_90_generation_from_short_input
@LysandreJik, just a clarification - so you propose not to have a fixed speed threshold in any of the "clauses". i.e. for non-essential tests as defined by you they should be marked as slow regardless of their speed, correct? I suppose this is smart since even very fast tests still add up to a lot since there could be many of them.
Yes, I think that would be best! I don't think there are many non-essential tests that are not slow though. We'd still like to get full coverage of the library's internals using only non-@slow tests, so getting these tests below a certain time threshold would still be important so that every PR could get quick feedback on the CI's status.
Thank you for that clarification, @LysandreJik
Please have a look at how your suggestions have been integrated into the testing doc:
https://github.com/huggingface/transformers/pull/7895/files
Most helpful comment
I would change all things that need to do a training to all thing that need to do a real training.
I spent a lot of time making a mock training fast for the tests of the Trainer and I don't want that marked as slow ;-)