Stable-baselines: Cannot run default implementation of GAIL

Created on 14 Oct 2018  路  7Comments  路  Source: hill-a/stable-baselines

Hi all,

Thanks so much for putting this together, I'm very excited about it!

I am trying to run the default implementation of GAIL, but it is throwing an error about the input policy being the wrong type as shown below. Note that I already downloaded the default expert data for the deterministic Hopper expert, and that it is logging out the return stats on the data.

assert issubclass(self.policy, ActorCriticPolicy), "Error: the input policy for the TRPO model must be " \ AssertionError: Error: the input policy for the TRPO model must be an instance of common.policies.ActorCriticPolicy.

How to reproduce:
In command line: python -m stable_baselines.gail.run_mujoco

Please let me know if I'm missing something, thanks again!

bug

All 7 comments

Hi,
Thanks for reporting the bug. @hill-a, it seems that we forgot to refactor GAIL policies.
It's hard for us to debug GAIL for now, as we don't have a mujoco licence...
Feel free to submit a PR if you solve the issue ;)

To guide you a bit:
take a look at common.policies.py, you will need to tweak gail.mlp_policy.py a bit so it follows the structure of the base class.
Don't hesitate to ask if you have any additional questions ;)

Hi! I'm currently taking a look into this issue, will let you know if I can figure out the cause!

Edit: just explored codebase a bit more, think I should be able to get a fix ready soon.
Edit 2: Is the general design pattern for these scripts is not using the policy_fn? @araffin

Hi @krishpop ,

Thank you for working on that!

Is the general design pattern for these scripts is not using the policy_fn?

I'm not sure to understand what you mean...

It seems in much of the original baselines code, and in some of the code in stable-baselines, there is a usage of a lambda argument called policy_fn that generates a policy of a particular type when called. However, it seems the usage of policy_fn is being phased out in stable-baselines, after looking into the trpo_mpi implementation, as well as the BaseRLModel, it seems now that there is a policy registry used to organize the different policies. In gail.run_mujoco, the policy_fn is still in use (see line 92) in order to pass the policy_fn generator to TRPO class and behavior_clone.learn method. This is the root of the bug as TRPO is expecting a policy that is already instantiated before creating the TRPO instance

Ok, now I understand what you mean ;)
So, yes this behavior was removed in favor of passing either a string (if the policy is registered) or a class.

EDIT: And I think you also need to update GAIL mlp policy so it follows the new interface.

Yes, thanks for clarifying. I will try fixing that, test, and send a PR soon

@flaurida @krishpop I started working on updating GAIL (branch gail-test). I'm also adapting it to make it work with other environments (not only mujoco ones) and check at the same time the performance.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

maystroh picture maystroh  路  3Comments

JankyOo picture JankyOo  路  3Comments

acyclics picture acyclics  路  3Comments

HareshKarnan picture HareshKarnan  路  3Comments

SerialIterator picture SerialIterator  路  3Comments