There are a few things about the examples/ tests that are suboptimal:
@slow decorator used in the main tests is not importable, so there are no @slow tests.test_run_glue uses distilbert-case-cased. It should use a smaller model, one of the tiny family here or a new tiny model.Any help on any of these fronts would be much appreciated!
Hi, @sshleifer I would like to work on this issue. Shall I take this up.
Yes, I would pick one item from the list to start with.
Make sure you pull first, I just merged some improvements.
@sshleifer I will work on the first one.
Just to be clear I will note down what I have understood and what I have in mind to do.
The issue as per my understanding: The tests in the example folder are not up to the mark and we have to add certain parts to fix this. For this, as the first point suggests when running tests in the examples folder the tests is not checking if Cuda or fp16 is available.
There are 4 tests in the test_examples.py
correct if I am wrong.
Good idea.
1) Yes. I think the desired behavior is
if torch.cuda.is_available():
Try to do that for all tests. Some will likely break. You can add a TODO to those and keep them running on CPU for now.
1b) You probably need a GPU to do this PR.
2) There are more tests than that:
$ ls examples/**/test*.py
examples/adversarial/test_hans.py
examples/summarization/bertabs/test_utils_summarization.py
examples/summarization/test_summarization_examples.py
examples/test_examples.py
examples/token-classification/test_ner_examples.py
examples/translation/t5/test_t5_examples.py
You don't need to cover all those tests. Feel free to break the work into very small PRs and tag me on them.
Thanks, @sshleifer for the clarification
I will start working on this.
2. The `@slow` decorator used in the main tests is not importable, so there are no @slow tests.
This is no longer the case.
from transformers.testing_utils import slow
This item can be removed.
3. `test_run_glue` uses distilbert-case-cased. It should use a smaller model, one of the `tiny` family [here](https://huggingface.co/models?search=sshleifer/tiny) or a new tiny model.
I tried a few and either they have a wrong head dimension as in sshleifer/tiny-distilbert-base-cased (9x2), but tests are (2x2), so it won't load as is (size mismatch for classifier.weight: and size mismatch for classifier.bias), or they perform terribly with the current test settings.
I also did an experiment for the same for the suggested inside the existing test:
def test_run_language_modeling(self):
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)
# TODO: switch to smaller model like sshleifer/tiny-distilroberta-base
with terrible results (perplexity > 5,000, whereas the current one < 35).
So when these tiny models are suggested as a replacement to speed things up, what things are to be sacrificed?
Happy to do big models and mark slow. I just don't want to do big models when we are only testing output shape.
Happy to do big models and mark slow. I just don't want to do big models when we are only testing output shape.
So then we could write a test that uses a tiny model that does just that? i.e. no outcome quality checks. Leaving big models for quality checks with @slow.
Yes!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.