Transformers: Examples tests improvements

Created on 16 Jun 2020 · 12Comments · Source: huggingface/transformers

There are a few things about the examples/ tests that are suboptimal:

They never use cuda or fp16, even if they are available.
The @slow decorator used in the main tests is not importable, so there are no @slow tests.
test_run_glue uses distilbert-case-cased. It should use a smaller model, one of the tiny family here or a new tiny model.
There is no test coverage for TPU.

Any help on any of these fronts would be much appreciated!

Examples Good First Issue Help wanted cleanup wontfix

Source

sshleifer

All 12 comments

Hi, @sshleifer I would like to work on this issue. Shall I take this up.

Joel-hanson on 16 Jun 2020

👍1

Yes, I would pick one item from the list to start with.
Make sure you pull first, I just merged some improvements.

sshleifer on 16 Jun 2020

@sshleifer I will work on the first one.

Just to be clear I will note down what I have understood and what I have in mind to do.

The issue as per my understanding: The tests in the example folder are not up to the mark and we have to add certain parts to fix this. For this, as the first point suggests when running tests in the examples folder the tests is not checking if Cuda or fp16 is available.
There are 4 tests in the test_examples.py
text-classification(run_glue)
language-modeling(run_language_modeling)
question-answering(run_squad)
text-generation(run_generation)
so each should run with cuda or fp16 if available.

correct if I am wrong.

Joel-hanson on 17 Jun 2020

Good idea.

1) Yes. I think the desired behavior is
if torch.cuda.is_available():

assume fp16 is available
run the code with fp16 and cude.

Try to do that for all tests. Some will likely break. You can add a TODO to those and keep them running on CPU for now.

1b) You probably need a GPU to do this PR.

2) There are more tests than that:

$ ls examples/**/test*.py

examples/adversarial/test_hans.py
examples/summarization/bertabs/test_utils_summarization.py
examples/summarization/test_summarization_examples.py
examples/test_examples.py
examples/token-classification/test_ner_examples.py
examples/translation/t5/test_t5_examples.py

sshleifer on 17 Jun 2020

👍1

You don't need to cover all those tests. Feel free to break the work into very small PRs and tag me on them.

sshleifer on 17 Jun 2020

Thanks, @sshleifer for the clarification

I will start working on this.

Joel-hanson on 18 Jun 2020

2. The `@slow` decorator used in the main tests is not importable, so there are no @slow tests.

This is no longer the case.

from transformers.testing_utils import slow

This item can be removed.

stas00 on 26 Jul 2020

3. `test_run_glue` uses distilbert-case-cased. It should use a smaller model, one of the `tiny` family [here](https://huggingface.co/models?search=sshleifer/tiny) or a new tiny model.

I tried a few and either they have a wrong head dimension as in sshleifer/tiny-distilbert-base-cased (9x2), but tests are (2x2), so it won't load as is (size mismatch for classifier.weight: and size mismatch for classifier.bias), or they perform terribly with the current test settings.

I also did an experiment for the same for the suggested inside the existing test:

    def test_run_language_modeling(self):
        stream_handler = logging.StreamHandler(sys.stdout)
        logger.addHandler(stream_handler)
        # TODO: switch to smaller model like sshleifer/tiny-distilroberta-base

with terrible results (perplexity > 5,000, whereas the current one < 35).

So when these tiny models are suggested as a replacement to speed things up, what things are to be sacrificed?

stas00 on 26 Jul 2020

Happy to do big models and mark slow. I just don't want to do big models when we are only testing output shape.

sshleifer on 26 Jul 2020

Happy to do big models and mark slow. I just don't want to do big models when we are only testing output shape.

So then we could write a test that uses a tiny model that does just that? i.e. no outcome quality checks. Leaving big models for quality checks with @slow.

stas00 on 26 Jul 2020

Yes!

sshleifer on 26 Jul 2020

👍1

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.