Transformers: benchmarking API: `no_` arguments, double negation, defaults

Created on 3 Aug 2020 · 9Comments · Source: huggingface/transformers

Another discussion moved to an issue:

Here is the gist of it:

This help entry: https://github.com/huggingface/transformers/blob/master/src/transformers/benchmark/benchmark_args_utils.py#L74

goes:

"help": "Don't use multiprocessing for memory and speed measurement. It is highly recommended to 
use multiprocessing for accurate CPU and GPU memory measurements.

This reads as a contradiction. but then the first sentence "mimics" the rest of the "help" entries, which mostly start with "Don't use", so while technically it's correct, it's not user-friendly.

And then while most args start with no_, there are some that are normal:

    no_inference: bool = field(default=False, metadata={"help": "Don't benchmark inference of model"})
    no_cuda: bool = field(default=False, metadata={"help": "Whether to run on available cuda devices"})
    no_tpu: bool = field(default=False, metadata={"help": "Whether to run on available tpu devices"})
    fp16: bool = field(default=False, metadata={"help": "Use FP16 to accelerate inference."})
    training: bool = field(default=False, metadata={"help": "Benchmark training of model"})

and also the code is quite hard to read because of multiple double negations, e.g. here:

            for batch_size in self.args.batch_sizes:
                for sequence_length in self.args.sequence_lengths:
                    if not self.args.no_inference:
                        if not self.args.no_memory:
                            memory, inference_summary = self.inference_memory(model_name, batch_size, sequence_length)
                            inference_result_memory[model_name]["result"][batch_size][sequence_length] = memory
                        if not self.args.no_speed:
                            time = self.inference_speed(model_name, batch_size, sequence_length)
                            inference_result_time[model_name]["result"][batch_size][sequence_length] = time

Further in benchmark.py, we have:

            if self.args.is_tpu or self.args.torchscript:

but the cli arg was no_tpu, here it is is_tpu. (same goes for other no_s)

@patrickvonplaten suggested:

the reason for having both no_ and "no-negation-prefix" arguments is that all boolean arguments should be set to False as a default: e.g. the default should be to benchmark "inference", but to not benchmark "training". Guess it's up for discussion whether this is the best design choice.

I see why "Don't ..." can be confusing! Maybe it's best if we change all no_ arguments to "Disable ...." help descriptions and all "no-negation-prefix" arguments to "Enable ...." - Wdyt?

wontfix

Source

stas00

Most helpful comment

I'm fine with the change.

I would only add that changing CLI requires a fair amount of manual care and testing because many bash scripts and README.md commands are not run by CI.

I think positive flags with action="store_true" like--inferenceare nice and would prefer not writing out=True` all the time for them. I don't feel that strongly, however.

sshleifer on 4 Aug 2020

👍2

All 9 comments

the reason for having both no_ and "no-negation-prefix" arguments is that all boolean arguments should be set to False as a default: e.g. the default should be to benchmark "inference", but to not benchmark "training". Guess it's up for discussion whether this is the best design choice.

Why can't the defaults do the inversion, e.g.:

    inference: bool = field(default=True,  metadata={"help": "Benchmark inference of model"})
    training:  bool = field(default=False, metadata={"help": "Benchmark training of model"})
    fp16:      bool = field(default=False, metadata={"help": "Use FP16 to accelerate inference."})
    verbose:   bool = field(default=False, metadata={"help": "Verbose memory tracing"})
    speed:     bool = field(default=True,  metadata={"help": "Perform speed measurements"})
    memory:    bool = field(default=True,  metadata={"help": "Perform memory measurements"})

note, I removed no_ from arg names

stas00 on 3 Aug 2020

After hearing how this cli API came about (less typing for most common uses) - my suggestion is in addition to the normal "positive" api, it should be possible to add various aliases that compound several most commonly used options in one short flag., it doesn't even have to be meaningful - e.g.:

-9  to be the same as --inference=True --training=False --memory=True speed=False

etc. it'll be memorized quickly after a few look ups.

A few lines of code could loop over sys.argv and expand the super-short aliases into normal flags before argparse is invoked.

stas00 on 4 Aug 2020

Hey @stas00,

After thinking a bit about it, I agree that the library should define only positive args It's actually handled quite nicely in the hfArgumentParser, I noticed:
https://github.com/huggingface/transformers/blob/7ea9b2db3732904014b9121fb8a5c896ae00d4cf/src/transformers/hf_argparser.py#L70

This line means that if one adds a positive argument, such as inference and sets the default to True => then running --no-inference from the command-line (without any following argument) sets self.args.inference = False (no-inference is put as the command line argument, but args.inference is stored as False instead of args.no-inference). This is quite intuitive IMO, but we should also add a line in the --help description explaining the functionality.

So I think it would be great if we can make actually all args in the library consistent so that the name of all args is always positive -> there should be no args.no-<something>, e.g. args.no_inference used anywhere, but only args.inference. I think this concerns not only the benchmarking args, but also the data collator and training args.
Then we can add to the docs and in the helper descriptions that args that default to True can be disabled automatically via no-<name-of-arg>.

Thinking in terms of breaking backward compatibility, I think the arg names and the default values can still be changed, since people use it mostly via the command line and can easily adapt their code.

Looping in @julien-c @sgugger @LysandreJik @sshleifer to check if this "clean-up" is ok.

In a second step we can think about adding shortcuts for combinations of args for benchmarks I think.

patrickvonplaten on 4 Aug 2020

👍1

I agree that having only positive arguments would be clean, especially with regards to the double negatives which make the code less intuitive.

In terms of backwards-compatibility, I think it would be easy enough to do such a change without any breaking change. Similar to the line you linked @patrickvonplaten, the (previously) negative arguments can still be handled.

Shortcuts for combinations of args sounds like a very nice idea as well, but I agree with @patrickvonplaten that it can be handled in another PR.

LysandreJik on 4 Aug 2020

I also agree with everything @patrickvonplaten and @LysandreJik said. Positive arguments are easier to document and @stas00 point on the code readability is completely spot-on.

sgugger on 4 Aug 2020

I'm fine with the change.

I would only add that changing CLI requires a fair amount of manual care and testing because many bash scripts and README.md commands are not run by CI.

I think positive flags with action="store_true" like--inferenceare nice and would prefer not writing out=True` all the time for them. I don't feel that strongly, however.

sshleifer on 4 Aug 2020

👍2

Awesome, I guess if someone feels like it, we could start a PR for it :-)

patrickvonplaten on 6 Aug 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.