Stable-baselines: Is double Q network updated independently in SAC?

Created on 9 May 2020  Â·  3Comments  Â·  Source: hill-a/stable-baselines

  • qf1, qf2, value_fn = self.policy_tf.make_critics(self.processed_obs_ph, self.actions_ph,

                                                                 create_qf=True, create_vf=True)
    
    • qf1_loss = 0.5 * tf.reduce_mean((q_backup - qf1) * 2)
      qf2_loss = 0.5 * tf.reduce_mean((q_backup - qf2) *
      2)

    • values_losses = qf1_loss + qf2_loss + value_loss
      Are two Q networks just initialized differently?If so, does it improve the effect significantly?

question

All 3 comments

Yes, that's one of the tricks in SAC. See SpinningUp description of SAC. You may close this issue if you have no further questions related to stable-baselines.

Are two Q networks just initialized differently?If so, does it improve the effect significantly?

I recommend you to read TD3 (which introduces the clipped double q-learning) and SAC papers for a better understanding.
In short, yes, they are initialized differently and it allows to reduce overestimation of the q-value by taking the min of the two.

Thank you!!!

Was this page helpful?
0 / 5 - 0 ratings