Ray: [rllib] incorrect model output for DQN with torch and dueling=false

Created on 9 Jul 2020 · 7Comments · Source: ray-project/ray

What is the problem?

The output fo the DQN model is not within the action space.

Something is wrong when constructing the torch model when dueling is off. The output dimension of the model is equal to whatever is passed in "fcnet_hiddens" instead of being of the size of the action space.

Ray version and other system information (Python version, TensorFlow version, OS):

ray==0.9.0.dev0
python 3.6.10
mac OS

Reproduction (REQUIRED)

import ray
from ray import tune

ray.init()

config = {
    "env": "CartPole-v1",
    "num_workers": 1,
    "train_batch_size": 128,
    "learning_starts": 128,
    "model": {"fcnet_hiddens": [32]},
    "dueling": False ,
    "framework": "torch"
}

tune.run("DQN", name="MWE", config=config, stop={"training_iteration": 100})

[x] I have verified my script runs in a clean environment and reproduces the issue.
[x] I have verified the issue also occurs with the latest wheels.

P2 bug rllib triage

Source

MaximeBouton

All 7 comments

Can you just change the following in your rllib/agents/dqn/dqn_torch_model.py (c'tor) ?

        advantage_module = nn.Sequential()
        value_module = nn.Sequential()

        # Dueling case: Build the shared (advantages and value) fc-network.
        if self.dueling:
            for i, n in enumerate(q_hiddens):
                advantage_module.add_module("dueling_A_{}".format(i),
                                            nn.Linear(ins, n))
                value_module.add_module("dueling_V_{}".format(i),
                                        nn.Linear(ins, n))
                # Add activations if necessary.
                if dueling_activation == "relu":
                    advantage_module.add_module("dueling_A_act_{}".format(i),
                                                nn.ReLU())
                    value_module.add_module("dueling_V_act_{}".format(i),
                                            nn.ReLU())
                elif dueling_activation == "tanh":
                    advantage_module.add_module("dueling_A_act_{}".format(i),
                                                nn.Tanh())
                    value_module.add_module("dueling_V_act_{}".format(i),
                                            nn.Tanh())

                # Add LayerNorm after each Dense.
                if add_layer_norm:
                    advantage_module.add_module("LayerNorm_A_{}".format(i),
                                                nn.LayerNorm(n))
                    value_module.add_module("LayerNorm_V_{}".format(i),
                                            nn.LayerNorm(n))
                ins = n

        # Actual Advantages layer (nodes=num-actions) and
        # value layer (nodes=1).
        advantage_module.add_module("A", nn.Linear(ins, action_space.n))
        value_module.add_module("V", nn.Linear(ins, 1))

That should fix it. Will PR now ...

sven1977 on 10 Jul 2020

@MaximeBouton

sven1977 on 10 Jul 2020

Just saw this, I can give it a try tomorrow morning

MaximeBouton on 10 Jul 2020

👍1

This PR fixes the issue: https://github.com/ray-project/ray/pull/9386
Will be merged today into master. Thanks for filing this!
Closing it now. Please feel free to re-open should this still not work on your end.

sven1977 on 10 Jul 2020

I installed the nightly version and it works, thanks for the quick fix!

MaximeBouton on 10 Jul 2020

👍1

This has been merged into master.

sven1977 on 10 Jul 2020

Awesome! Glad it's working. :)

sven1977 on 11 Jul 2020

Was this page helpful?

0 / 5 - 0 ratings