Fairseq: Replace use of argparse.Namespace with named args and kwargs where possible.

Created on 3 Feb 2020  路  15Comments  路  Source: pytorch/fairseq

馃殌 Feature Request

Remove argparse.Namespace parameter passing where possible.

Motivation

Using args makes code unclear to read because it couples implementations with argparse tightly. Functions and classes should rely on what they need to function and no more.

Pitch

With the exception of classmethods, arguments to functions and methods should be named and explicitly passed in. This makes the code:

  • More readable -- no more cluttering the code with args.X in implementations that _only_ care about X
  • More testable -- argparse is a runtime concern and won't/shouldn't be needed in tests
  • More portable -- if I want to use a class from fairseq in some other code, I don't need to "mock" argument parsing logic.
  • Less tightly coupled -- instead of guessing why some call-site passed in args, I know exactly what the minimal information required to compute some function will be.

Alternatives

Leave the code as is. 馃槃

Additional context

As I begin looking at the refactor for #1648, I see args buried deep in the code. This refactor would be much easier if this were implemented first.

enhancement help wanted

Most helpful comment

we are just getting started. the plan is to try to push pieces to OSS as we build them as opposed to landing one giant commit. we should have first changes out in the next couple of weeks.

All 15 comments

Yes, @alexeib and I have been discussing this. I agree that the first step is removing Namespaces where possible and moving to explicit named arguments/kwargs.

We've also been discussing adopting something like hydra as an additional interface to the current command line arguments. What do you think?

I'm not so familiar with hydra outside of its motivation. I think configuration files as opposed to strictly using command-line args is super desirable, though! In favor of this change. 馃憤

Oh wow, hydra seems very nice. I really like the idea of this now because it answers another question I've been having: how can I (easily) do hyperparameter tuning with fairseq? With --multirun, I could provide a hyperparam grid. Nice!

Have you considered switching to ConfigArgParse? It seems to be compatible with normal argparse library but offers handling configuration files and/or env vars.

I am using fairseq for a couple of weeks and I miss this functionality.

The fact that the namespaces are being pushed so far down into implementation is a bigger problem. Once that's addressed, swapping out the configuration mechanism should be a breeze.

This should probably be re-opened, but I'm curious: @myleott what do you think about having typed configs for models and tasks? The is how huggingface approaches it and is a middle-ground between having some per-component encapsulation and allowing for a flexible library experience.

we are currently working on integrating hydra (https://hydra.cc/) into fairseq. The latest version has typed configs etc. Stay tuned.

That's why I was asking. 馃槃 I have been working with hydra and would be happy to start making commits - I just want to make sure it's not going to die on the vine. The one caveat in my experiments with hydra is that it has no mechanism to support --user-dir semantics.

yeah that is a problem. I believe @omry is thinking about how to address it, or we can hack around it meanwhile. happy to work with you on this as well! (@kevinmtian is working on this with me on our side)

Ah, presumably you're working on it in a FB-side repo and not in the OSS? I'm not sure if there's a substantive way for me to help until a branch hits the OSS fairseq. Not sure if you have ideas about that, though. Hopefully I'll be gaining access to the PyTorch slack soon and we can have a lengthier discussion there if necessary.

we are just getting started. the plan is to try to push pieces to OSS as we build them as opposed to landing one giant commit. we should have first changes out in the next couple of weeks.

Awesome! I'll keep an eye out for new branches and will try to help out as I can.

  1. Source of truth for Hydra is the open source repo. There are some internal Facebook-only plugins but almost everything is in the repo.
  2. I am well aware of the current limitations with user specified config file / dir and will address them in a future version.
    there are workarounds we can discuss until it's ready.

Coreference is hard. 馃槅 "it" in my comment above was the fairseq branch for integration with hydra. Very excited about this!

Coreference is hard. "it" in my comment above was the fairseq branch for integration with hydra. Very excited about this!

Me too!

Was this page helpful?
0 / 5 - 0 ratings