Transformers: Examples tests improvements

Created on 16 Jun 2020  路  12Comments  路  Source: huggingface/transformers

There are a few things about the examples/ tests that are suboptimal:

  1. They never use cuda or fp16, even if they are available.
  2. The @slow decorator used in the main tests is not importable, so there are no @slow tests.
  3. test_run_glue uses distilbert-case-cased. It should use a smaller model, one of the tiny family here or a new tiny model.
  4. There is no test coverage for TPU.

Any help on any of these fronts would be much appreciated!

Examples Good First Issue Help wanted cleanup wontfix

All 12 comments

Hi, @sshleifer I would like to work on this issue. Shall I take this up.

Yes, I would pick one item from the list to start with.
Make sure you pull first, I just merged some improvements.

@sshleifer I will work on the first one.

Just to be clear I will note down what I have understood and what I have in mind to do.

  1. The issue as per my understanding: The tests in the example folder are not up to the mark and we have to add certain parts to fix this. For this, as the first point suggests when running tests in the examples folder the tests is not checking if Cuda or fp16 is available.

  2. There are 4 tests in the test_examples.py

  3. text-classification(run_glue)
  4. language-modeling(run_language_modeling)
  5. question-answering(run_squad)
  6. text-generation(run_generation)
    so each should run with cuda or fp16 if available.

correct if I am wrong.

Good idea.

1) Yes. I think the desired behavior is
if torch.cuda.is_available():

  • assume fp16 is available
  • run the code with fp16 and cude.

Try to do that for all tests. Some will likely break. You can add a TODO to those and keep them running on CPU for now.

1b) You probably need a GPU to do this PR.

2) There are more tests than that:

$ ls examples/**/test*.py

examples/adversarial/test_hans.py
examples/summarization/bertabs/test_utils_summarization.py
examples/summarization/test_summarization_examples.py
examples/test_examples.py
examples/token-classification/test_ner_examples.py
examples/translation/t5/test_t5_examples.py

You don't need to cover all those tests. Feel free to break the work into very small PRs and tag me on them.

Thanks, @sshleifer for the clarification

I will start working on this.

2. The `@slow` decorator used in the main tests is not importable, so there are no @slow tests.

This is no longer the case.

from transformers.testing_utils import slow

This item can be removed.

3. `test_run_glue` uses distilbert-case-cased. It should use a smaller model, one of the `tiny` family [here](https://huggingface.co/models?search=sshleifer/tiny) or a new tiny model.

I tried a few and either they have a wrong head dimension as in sshleifer/tiny-distilbert-base-cased (9x2), but tests are (2x2), so it won't load as is (size mismatch for classifier.weight: and size mismatch for classifier.bias), or they perform terribly with the current test settings.

I also did an experiment for the same for the suggested inside the existing test:

    def test_run_language_modeling(self):
        stream_handler = logging.StreamHandler(sys.stdout)
        logger.addHandler(stream_handler)
        # TODO: switch to smaller model like sshleifer/tiny-distilroberta-base

with terrible results (perplexity > 5,000, whereas the current one < 35).

So when these tiny models are suggested as a replacement to speed things up, what things are to be sacrificed?

Happy to do big models and mark slow. I just don't want to do big models when we are only testing output shape.

Happy to do big models and mark slow. I just don't want to do big models when we are only testing output shape.

So then we could write a test that uses a tiny model that does just that? i.e. no outcome quality checks. Leaving big models for quality checks with @slow.

Yes!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

iedmrc picture iedmrc  路  3Comments

HanGuo97 picture HanGuo97  路  3Comments

alphanlp picture alphanlp  路  3Comments

yspaik picture yspaik  路  3Comments

lemonhu picture lemonhu  路  3Comments