Stable-baselines: Is double Q network updated independently in SAC？

Created on 9 May 2020 · 3Comments · Source: hill-a/stable-baselines

qf1, qf2, value_fn = self.policy_tf.make_critics(self.processed_obs_ph, self.actions_ph,
```
                                                             create_qf=True, create_vf=True)
```
- qf1_loss = 0.5 * tf.reduce_mean((q_backup - qf1) * 2)
  qf2_loss = 0.5 * tf.reduce_mean((q_backup - qf2) * 2)
- values_losses = qf1_loss + qf2_loss + value_loss
  Are two Q networks just initialized differently?If so, does it improve the effect significantly?

question

Source

BlackDeal

All 3 comments

Yes, that's one of the tricks in SAC. See SpinningUp description of SAC. You may close this issue if you have no further questions related to stable-baselines.

Miffyli on 9 May 2020

👍1

Are two Q networks just initialized differently?If so, does it improve the effect significantly?

I recommend you to read TD3 (which introduces the clipped double q-learning) and SAC papers for a better understanding.
In short, yes, they are initialized differently and it allows to reduce overestimation of the q-value by taking the min of the two.