Hi all,
Thanks so much for putting this together, I'm very excited about it!
I am trying to run the default implementation of GAIL, but it is throwing an error about the input policy being the wrong type as shown below. Note that I already downloaded the default expert data for the deterministic Hopper expert, and that it is logging out the return stats on the data.
assert issubclass(self.policy, ActorCriticPolicy), "Error: the input policy for the TRPO model must be " \
AssertionError: Error: the input policy for the TRPO model must be an instance of common.policies.ActorCriticPolicy.
How to reproduce:
In command line: python -m stable_baselines.gail.run_mujoco
Please let me know if I'm missing something, thanks again!
Hi,
Thanks for reporting the bug. @hill-a, it seems that we forgot to refactor GAIL policies.
It's hard for us to debug GAIL for now, as we don't have a mujoco licence...
Feel free to submit a PR if you solve the issue ;)
To guide you a bit:
take a look at common.policies.py, you will need to tweak gail.mlp_policy.py a bit so it follows the structure of the base class.
Don't hesitate to ask if you have any additional questions ;)
Hi! I'm currently taking a look into this issue, will let you know if I can figure out the cause!
Edit: just explored codebase a bit more, think I should be able to get a fix ready soon.
Edit 2: Is the general design pattern for these scripts is not using the policy_fn? @araffin
Hi @krishpop ,
Thank you for working on that!
Is the general design pattern for these scripts is not using the
policy_fn?
I'm not sure to understand what you mean...
It seems in much of the original baselines code, and in some of the code in stable-baselines, there is a usage of a lambda argument called policy_fn that generates a policy of a particular type when called. However, it seems the usage of policy_fn is being phased out in stable-baselines, after looking into the trpo_mpi implementation, as well as the BaseRLModel, it seems now that there is a policy registry used to organize the different policies. In gail.run_mujoco, the policy_fn is still in use (see line 92) in order to pass the policy_fn generator to TRPO class and behavior_clone.learn method. This is the root of the bug as TRPO is expecting a policy that is already instantiated before creating the TRPO instance
Ok, now I understand what you mean ;)
So, yes this behavior was removed in favor of passing either a string (if the policy is registered) or a class.
EDIT: And I think you also need to update GAIL mlp policy so it follows the new interface.
Yes, thanks for clarifying. I will try fixing that, test, and send a PR soon
@flaurida @krishpop I started working on updating GAIL (branch gail-test). I'm also adapting it to make it work with other environments (not only mujoco ones) and check at the same time the performance.