Stable-baselines: DQN output is normalized?

Created on 24 Feb 2020 · 4Comments · Source: hill-a/stable-baselines

Important Note: We do not do technical support, nor consulting and don't answer personal questions per email.

If you have any questions, feel free to create an issue with the tag [question].
If you wish to suggest an enhancement or feature request, add the tag [feature request].
If you are submitting a bug report, please fill in the following details.

If your issue is related to a custom gym environment, please check it first using:

from stable_baselines.common.env_checker import check_env

env = CustomEnv(arg1, ...)
# It will check your custom environment and output additional warnings if needed
check_env(env)

Describe the bug
The output of DQN should be the estimated Q values, while it seems that there is one softmax layer at the end of DQN network. I am wondering how is this DQN trained? Did I make some mistake understanding?
A clear and concise description of what the bug is.

Code example

Please use the markdown code blocks
for both code and stack traces.

from stable_baselines.common.atari_wrappers import make_atari
from stable_baselines.deepq.policies import MlpPolicy, CnnPolicy
from stable_baselines import DQN,ACKTR,A2C,PPO2
from stable_baselines.common.cmd_util import make_atari_env
from stable_baselines.common.vec_env import VecFrameStack
import cv2
import matplotlib.pyplot as plt
import numpy as np
from IPython import display
import os
game_list = ['Pong','Breakout','SpaceInvaders','Seaquest','BeamRider','Qbert','Enduro']
method_list = ['dqn','ppo2','a2c','acktr']
method = method_list[0]
for game in game_list:
    env = make_atari_env('{}NoFrameskip-v4'.format(game), num_env=1, seed=0)
    env = VecFrameStack(env, n_stack=4)    
    model = DQN.load("trained_agents/{}/{}NoFrameskip-v4.pkl".format(method,game))
    env.reset()
    model.set_env(env);
    obs = env.reset()
    for i in range(1000):
        actions = model.action_probability(obs) #Here I want to get the Q values
        argmax_action = np.argmax(actions)
        action, _states = model.predict(obs)
        print(actions)
        print('the sum: {}'.format(np.sum(actions)))
        obs, rewards, dones, infos = env.step(action)
        episode_infos = infos[0].get('episode')

Traceback (most recent call last): File ...

System Info
[[0.16696884 0.16787794 0.16488545 0.16646059 0.16625851 0.16754872]]
the sum: 1.0
[[0.16700052 0.16733757 0.16515337 0.1665579 0.16647142 0.1674792 ]]
the sum: 1.0
[[0.16888289 0.16632704 0.16429803 0.16766034 0.16476472 0.168067 ]]
the sum: 1.0
[[0.16873147 0.1668533 0.1633186 0.16757944 0.16445793 0.16905922]]
the sum: 0.9999999403953552
[[0.16896197 0.16633949 0.16370982 0.16852172 0.16337387 0.16909312]]
the sum: 1.0
[[0.16926633 0.1685718 0.16038649 0.16665836 0.16892989 0.16618706]]
the sum: 0.9999998807907104
[[0.17271757 0.16515626 0.15838557 0.1654946 0.16886355 0.16938245]]
the sum: 1.0
[[0.17090748 0.16443412 0.15841582 0.16640182 0.17027445 0.16956638]]
the sum: 1.0
Describe the characteristic of your environment:

Describe how the library was installed (pip, docker, source, ...)
GPU models and configuration
Python version
Tensorflow version
Versions of any other relevant libraries

Additional context
Add any other context about the problem here.

RTFM question

Source

QuXinghuaNTU

Most helpful comment

_, qvalues, _ = model.step_model.step(state, deterministic=True)

Use this line of code can obtain the q_values from the DQN model.
Hope this can help other researchers that look for this solution

QuXinghuaNTU on 26 Feb 2020

👍3 ❤1

All 4 comments

Hello,
you are not looking at the q values but the action probability (cf doc).

PS: as mentioned in the issue template, please format your code using code block

araffin on 24 Feb 2020

Hi Araffin,

Thanks for your reply.
May I check how to output the Q values given a state s_t after I fully trained a DQN model?
From my understanding, the action probability is normalized from the output Q values. However, I did not find how to call this.
Thanks a lot

QuXinghuaNTU on 24 Feb 2020

You need to check the code for that: https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/deepq/policies.py#L149

Example call: https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/deepq/dqn.py#L310

araffin on 24 Feb 2020

_, qvalues, _ = model.step_model.step(state, deterministic=True)

Use this line of code can obtain the q_values from the DQN model.
Hope this can help other researchers that look for this solution

QuXinghuaNTU on 26 Feb 2020

👍3 ❤1

Was this page helpful?

0 / 5 - 0 ratings

Related issues

[questions] variable in the function

junhyeokahn · 3Comments

[Question] Default activation function for MLP Policy

matthew-hsr · 3Comments

"Error: the action space must be a vector" error is not included in the env_checker

saeid93 · 3Comments

RDPG implementation ?

H2SO4T · 3Comments

SubprocVecEnv problem

maystroh · 3Comments