Ray: [rllib] incorrect model output for DQN with torch and dueling=false

Created on 9 Jul 2020  路  7Comments  路  Source: ray-project/ray

What is the problem?

The output fo the DQN model is not within the action space.

Something is wrong when constructing the torch model when dueling is off. The output dimension of the model is equal to whatever is passed in "fcnet_hiddens" instead of being of the size of the action space.

Ray version and other system information (Python version, TensorFlow version, OS):

  • ray==0.9.0.dev0
  • python 3.6.10
  • mac OS

Reproduction (REQUIRED)

import ray
from ray import tune

ray.init()

config = {
    "env": "CartPole-v1",
    "num_workers": 1,
    "train_batch_size": 128,
    "learning_starts": 128,
    "model": {"fcnet_hiddens": [32]},
    "dueling": False ,
    "framework": "torch"
}

tune.run("DQN", name="MWE", config=config, stop={"training_iteration": 100})
  • [x] I have verified my script runs in a clean environment and reproduces the issue.
  • [x] I have verified the issue also occurs with the latest wheels.
P2 bug rllib triage

All 7 comments

Can you just change the following in your rllib/agents/dqn/dqn_torch_model.py (c'tor) ?

        advantage_module = nn.Sequential()
        value_module = nn.Sequential()

        # Dueling case: Build the shared (advantages and value) fc-network.
        if self.dueling:
            for i, n in enumerate(q_hiddens):
                advantage_module.add_module("dueling_A_{}".format(i),
                                            nn.Linear(ins, n))
                value_module.add_module("dueling_V_{}".format(i),
                                        nn.Linear(ins, n))
                # Add activations if necessary.
                if dueling_activation == "relu":
                    advantage_module.add_module("dueling_A_act_{}".format(i),
                                                nn.ReLU())
                    value_module.add_module("dueling_V_act_{}".format(i),
                                            nn.ReLU())
                elif dueling_activation == "tanh":
                    advantage_module.add_module("dueling_A_act_{}".format(i),
                                                nn.Tanh())
                    value_module.add_module("dueling_V_act_{}".format(i),
                                            nn.Tanh())

                # Add LayerNorm after each Dense.
                if add_layer_norm:
                    advantage_module.add_module("LayerNorm_A_{}".format(i),
                                                nn.LayerNorm(n))
                    value_module.add_module("LayerNorm_V_{}".format(i),
                                            nn.LayerNorm(n))
                ins = n

        # Actual Advantages layer (nodes=num-actions) and
        # value layer (nodes=1).
        advantage_module.add_module("A", nn.Linear(ins, action_space.n))
        value_module.add_module("V", nn.Linear(ins, 1))

That should fix it. Will PR now ...

@MaximeBouton

Just saw this, I can give it a try tomorrow morning

This PR fixes the issue: https://github.com/ray-project/ray/pull/9386
Will be merged today into master. Thanks for filing this!
Closing it now. Please feel free to re-open should this still not work on your end.

I installed the nightly version and it works, thanks for the quick fix!

This has been merged into master.

Awesome! Glad it's working. :)

Was this page helpful?
0 / 5 - 0 ratings

Related issues

WangYiPengPeter picture WangYiPengPeter  路  3Comments

zhaokang1228 picture zhaokang1228  路  3Comments

dragon28 picture dragon28  路  3Comments

austinmw picture austinmw  路  3Comments

robertnishihara picture robertnishihara  路  3Comments