Currently I am just manually rerunning the model with different config files and comparing results. Are there any smarter approaches?
Unfortunately, that's the best that's available at the moment. Just this week we had some conversations with @dodgejesse about better ways of doing hyperparameter tuning, but those ideas are all on how to select values to try, not how to actually run experiments.
One thing that may make things easier is that the config files are HOCON, which supports environment variables in most cases, and you can use the --overrides flag in cases where environment variables don't work (like when we need to pass parameters to pytorch code). This lets you set up a bunch of experiments in a shell script, for example, without having to actually modify the config file. You still need a way of submitting those experiments to a cluster, or running them sequentially on your machine...
Thanks for the answer.
@maxdel in September we're releasing software that will help execute jobs in the cloud and search the hyperparameter space. Until then, stay tuned!
Is there any support for Hyper-parameter tuning like Baysian optimization ?
Nope, unfortunately not. Contributions welcome! (Though this would be a really big feature, so if anyone wants to contribute this, talking with us first about the design would be a good idea.)
We're running into this now, think we can contribute. @schmmd Was there anything in progress already or design/implementation ideas we should be aware of?
@RXminuS we don't have anything in progress in AllenNLP to help out with Hyperparameter tuning. We were planning on launching beaker.org to the public so they could run jobs on our platform (we use this internally for hyperparameter tuning) but we've decided not to allow broad access to run jobs on the platform yet.
Sometime next year there may be a "Beaker Local" product for this--but we're still working through our annual planning.
Sorry for the false hope earlier!
That's alright. Our thinking was to create some kind of script/trainer to (continuously) generate a bunch of configs based on a hyper param optimizer. The base jsonnet files would then simply contain the constraints for that optimizer and the optimizer would generate "experiments"...bit like HELM charts really :-)
Then those configs could either be run in parallel across machines aggregating results or just sequentially on your local machine and nothing much about allennlp needs to change. In our team we just have a docker container that we submit with the config, but something like Beaker could also be used to execute the scripts. Main problem is aggregating back the results and running the next step in the hyperparam optimzer.
For the optimizer we've been looking at:
But maybe you have some better ideas? How does Beaker pick params?
@RXminuS I think you can get a feel for what Beaker does on Beaker.org. When using Beaker--the user basically needs to do what you suggested to do the hyperparameter search (generate a bunch of configuration files). Beaker helps with executing the experiments on the cloud and aggregating the results into a single view.
We've actually come across https://polyaxon.com/ which seems to do everything we were building ourselves and more. We've made some initial "config generation scripts" and if it works nicely I'll make sure to write a Medium post about it :-) But we're going to pause our work on this feature as a PR for AllenNLP for the moment.
I've recently discovered that there were attempts to use weights&biases sweeps with AllenNLP.
There are several reasons for using w&b in your project -- they are actively rolling out features and easily support Pytorch together with AllenNLP configs. They also have Bayesian search with hyperparameter importance.
Based on the above solution, I've arrived with command like allennlp sweep /path/to/config.jsonnet /path/to/sweep_config.yaml -s /path/to/serialization_dir --n_trials 10 --include-package your_package , which is basically a wrapper that makes it work like another allennlp command. I can also write a wrapper that removes _all_ wandb-specific boilerplate from sweep config, so code become more foolproof.
It produces what one wants, however, there are several technical obstacles:
store_true argparse arguments, because w&b use only --key=value formattrainer._tensorboard_writer to write metrics to wandb every moment they are logged into tensorboard. However, due to some conflicts, I can log metrics only at the end of every train/validation instead of every batch/rolling average.These problems are solvable (i'm using v0.9.0, but in v1.0.0 too), and I think that easy-to-use integration with some experiment tracker instead of (not so easy to use at least for me) allentune repo will be very useful for Show Your Work-like purposes.
I can try to make a PR with this or a separate package with this integration, but for the latter option I don't quite understand how to add custom command to AllenNLP without modifying AllenNLP's code.
Hey @mojesty, thanks for the idea! We think the best place for this to live would be in a separate pip package. We're happy to give pointers on things if you have particular questions as you do this, and we can find a way to advertise this somewhere on our repo / website once it's ready.
One point on your monkey patching of the tensorboard writer - if you upgrade to 1.0, you can use our BatchCallback.
And adding a custom command to AllenNLP is pretty easy - you just have to register a Subcommand. Again, if you have questions, we're happy to answer them.