Ray: [rllib] MARWIL seems not read data generated with a custom environment

Created on 10 Apr 2020 · 5Comments · Source: ray-project/ray

I use RL for several task, solving it more specifically PPO. But there are several behaviours where "human samples" are simple to provide and I think they must help a lot learning the required policy.
So I use the RLLIB implementation of MARWIL, first with cardpole (no problems and good training) but when I use my environment I found problems.

The most easy test I cannot pass is to learn a PPO policy and save samples from it, using a config in tune.run in RRLIB with "output": "./training_data". And then use it in MARVIL, such as:

tune.run(
    "MARWIL",
    stop={"timesteps_total": 5e6},
    config={
        "env": "env_sf_push_box",
        "beta": {
            "grid_search": [0], #[0, 1]
        },
        "model": {
            "fcnet_hiddens": [128, 128], 
        },
        # Whether to rollout "complete_episodes" or "truncate_episodes"
        #"batch_mode": "complete_episodes",
        #"input": "sampler",
        "input": "./training_data",

        "input_evaluation": ["simulation"]
        },
)

But the reward is not read because a lot of warnings such as WARNING:root:NaN or Inf found in input tensor are shown (the same you obtain when no adding "input_evaluation": ["simulation"].

There are something special I've to do with my environment (derived from gym.Env and usable with RRLIB and PPO) to work with MARWIL?
Thank you very much,
Regards,
Fidel

ray 0.9.0.dev0, python 3.7.6

question

Source

fidelaznar

All 5 comments

Hi @fidelaznar , did you find the solution to this problem, if you do please add comment, as I am also working on custom environment where I have pass custom data to the environment.

Thank You,
Ajay

Ajay-2007 on 5 Jul 2020

👍1

Sorry @Ajay-2007, I've not continued using MARWIL with RLLIB due to this problem.
Regards,
Fidel

fidelaznar on 6 Jul 2020

hello, I want to train model by offline datasets. on the example cartpole. But, my reward is always nan.
Do you know the reason?

zzchuman on 16 Jul 2020

👍1

@zzcNEU, you can set input_evaluation to 'simulation' at the trainer configs

azzeddineCH on 27 Jul 2020

Hi @Ajay-2007, did you managed to solve this potential error ? cause I may be having it also

azzeddineCH on 27 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

In Jenkins tests, test_0.py hangs occasionally.

robertnishihara · 3Comments

[rllib] maddpg, ModuleNotFoundError: No module named 'ray.rllib.agents.maddpg'

dragon28 · 3Comments

[tune] PBT hyperparam_mutations does not allow for nested dicts any more

timonbimon · 3Comments

[rllib] In multiagent environment, is timesteps_total the total timesteps per agent or over all agents?

coreylowman · 3Comments

format.sh script returns illegal option -o pipefail

1beb · 3Comments