Ray: [rllib] Example usage of ExternalEnv/ExternalMultiAgentEnv

Created on 29 Jan 2020 · 8Comments · Source: ray-project/ray

I want to use RLlib in conjunction with BlueSky ATM simulator. I'm having difficulties implementing a true parallelized environment of BlueSky, as BlueSky itself is using a server/client architecture. However, it allows for the creation of plugins, which are run by the simulator every time-step. I'm looking for some example material regarding how to implement ExternalMultiAgentEnv's, or any tips/insights.

Thanks!

question rllib stale

Source

devanderhoff

All 8 comments

Check out https://github.com/ray-project/ray/blob/60d4d5e1aaa9fde3cf541ee335e284d05e75679c/rllib/tests/test_external_env.py

There is also the cartpole server example (look for the policy server class).

I think what you want is something like a bluesky client in an externalenv, which pulls actions from the policies when needed, and communicates them to the sim.

ericl on 30 Jan 2020

Thanks for the comment! I was looking at the cartpole server example indeed.
As for the bluesky client class, its a bit messy as it uses continuous zmq polling to get updates, which creates lots of overhead.
Ill try and think of something, can I add questions to this issue on a later date?

Thanks!

devanderhoff on 30 Jan 2020

👍1

Works like a charm, thanks! A few questions remain:
1) When does the trainer train? As it cannot stop the simulator from providing data. So for on_policy methods, what exactly happens then? It does a sort of "off_policy" on_policy training?
2) Should the simulator stop providing observations when training?
3) If creating multiple workers on the simulator side, the episode ID distinguishes between data send. This allows for multiple workers that can collect data without trajectory observations to be mixed up?

devanderhoff on 31 Jan 2020

When does the trainer train? As it cannot stop the simulator from providing data. So for on_policy methods, what exactly happens then? It does a sort of "off_policy" on_policy training?

For synchronous on-policy methods, the sampler will actually stop "providing" action returns while the trainer is not sampling. So there is no off-policy behaviour. If you want to allow the simulation to keep querying actions during optimization phases, you can set sample_async: True (but this will introduce some off-policy data as noted).

Should the simulator stop providing observations when training?

I think as long as the simulator can handle long delays in get_action() it should be fine.

If creating multiple workers on the simulator side, the episode ID distinguishes between data send. This allows for multiple workers that can collect data without trajectory observations to be mixed up?

That's right, you can have a large number of concurrent episodes distinguished by ID (as long as they eventually finish so their memory is freed).

ericl on 31 Jan 2020

Thanks a bunch! Your comments reminds me of a mistake I should solve.

devanderhoff on 1 Feb 2020

Another question, when certain agents reach their goal state, I delete them from the simulation. This requires me to send a done dict normally when not using external envs. Is there a way to implement this for MultiAgentExternalEnv? Should I inherit both PolicyClient and PolicyServer and modify the log_returns to accept a done dict and process this?

devanderhoff on 6 Feb 2020

Yeah, that seems to a missing feature, you'll need to modify the env
implementation.

On Thu, Feb 6, 2020, 4:49 AM Dennis van der Hoff notifications@github.com
wrote:

Another question, when certain agents reach their goal state, I delete
them from the simulation. This requires me to send a done dict normally
when not using external envs. Is there a way to implement this for
MultiAgentExternalEnv? Should I inherit both PolicyClient and PolicyServer
and modify the log_returns to accept a done dict and process this?

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/ray-project/ray/issues/6950?email_source=notifications&email_token=AAADUSWJHPWOPNPO3IUXXRDRBQBPBA5CNFSM4KNC7CI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEK7DNPA#issuecomment-582891196,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAADUSQMRL4DIHEV22CVQM3RBQBPBANCNFSM4KNC7CIQ
.

ericl on 6 Feb 2020

👍1

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.