Ray: [rllib] MARWIL seems not read data generated with a custom environment

Created on 10 Apr 2020  路  5Comments  路  Source: ray-project/ray

I use RL for several task, solving it more specifically PPO. But there are several behaviours where "human samples" are simple to provide and I think they must help a lot learning the required policy.
So I use the RLLIB implementation of MARWIL, first with cardpole (no problems and good training) but when I use my environment I found problems.

The most easy test I cannot pass is to learn a PPO policy and save samples from it, using a config in tune.run in RRLIB with "output": "./training_data". And then use it in MARVIL, such as:

tune.run(
    "MARWIL",
    stop={"timesteps_total": 5e6},
    config={
        "env": "env_sf_push_box",
        "beta": {
            "grid_search": [0], #[0, 1]
        },
        "model": {
            "fcnet_hiddens": [128, 128], 
        },
        # Whether to rollout "complete_episodes" or "truncate_episodes"
        #"batch_mode": "complete_episodes",
        #"input": "sampler",
        "input": "./training_data",

        "input_evaluation": ["simulation"]
        },
)

But the reward is not read because a lot of warnings such as WARNING:root:NaN or Inf found in input tensor are shown (the same you obtain when no adding "input_evaluation": ["simulation"].

There are something special I've to do with my environment (derived from gym.Env and usable with RRLIB and PPO) to work with MARWIL?
Thank you very much,
Regards,
Fidel

ray 0.9.0.dev0, python 3.7.6

question

All 5 comments

Hi @fidelaznar , did you find the solution to this problem, if you do please add comment, as I am also working on custom environment where I have pass custom data to the environment.

Thank You,
Ajay

Sorry @Ajay-2007, I've not continued using MARWIL with RLLIB due to this problem.
Regards,
Fidel

hello, I want to train model by offline datasets. on the example cartpole. But, my reward is always nan.
Do you know the reason?

@zzcNEU, you can set input_evaluation to 'simulation' at the trainer configs

Hi @Ajay-2007, did you managed to solve this potential error ? cause I may be having it also

Was this page helpful?
0 / 5 - 0 ratings