Stable-baselines: InvalidArgumentError: You must feed a value for placeholder tensor 'model/batch_normalization_1/keras_learning_phase' with dtype bool error in Custom Policy

Created on 14 Jun 2019  路  2Comments  路  Source: hill-a/stable-baselines

Hi,

I was implementing a Custom Policy (including CNNs and LSTMs) in the following manner:

    with tf.variable_scope("model", reuse=reuse):

        bs_number, seq_len, num_features = self.processed_obs.shape
        bs = tf.shape(self.processed_obs)[0]

        # CNN acts as a feature extractor
        embed = Sequential([
            Conv2D(input_shape=(1, num_features, seq_len), 
                   filters=8, 
                   kernel_size=(3,3), 
                   strides=(2, 1), 
                   padding='same', 
                   data_format="channels_first", 
                   use_bias=False),
            BatchNormalization(axis=1, scale=False),
            Activation('relu'),
            Conv2D(filters=16, 
                   kernel_size=(3,3), 
                   strides=(1, 1), 
                   padding='same', 
                   data_format="channels_first", 
                   use_bias=False),
            BatchNormalization(axis=1)
        ])

        # Converts (bs, seq_len, feat) -> (bs, 1, seq_len, feat) -> (bs, 1, feat, seq_len) -> (bs, c, feat, seq_len)
        cnn_embedding = embed(
            tf.transpose(
                tf.expand_dims(self.processed_obs, axis=1), perm=[0,1,3,2]))


        new_num_features = cnn_embedding.shape[1] * cnn_embedding.shape[2]

        # LSTM is used for time-series prediction (3-layered)
        lstm = Sequential([
            LSTM(units=128, input_shape=(int(seq_len), int(new_num_features)), return_sequences=True),
            LSTM(units=256, return_sequences=True),
            LSTM(units=256)
        ])

        # Converts (bs, c, feat, seq_len) -> (bs, c*feat, seq_len) -> (bs, seq_len, c*feat) -> (bs, 256)
        feature_layer = lstm(
            tf.transpose(
                tf.reshape(cnn_embedding, [bs, new_num_features, seq_len]), perm=[0,2,1]))

        print(feature_layer.shape)

        # The non-shared components (MLP)
        pi_h = feature_layer
        for i, layer_size in enumerate([128, 128]):
            pi_h = tf.layers.dense(pi_h, layer_size, name='pi_fc' + str(i))
        pi_latent = pi_h # (bs, 128)

        print(pi_latent.shape)

        vf_h = feature_layer
        for i, layer_size in enumerate([32, 32]):
            vf_h = tf.layers.dense(vf_h, layer_size, name='vf_fc' + str(i))
        value_fn = tf.layers.dense(vf_h, 1, name='vf')
        vf_latent = vf_h # (bs, 32)

        print(vf_latent.shape)

        self._proba_distribution, self._policy, self.q_value = \
            self.pdtype.proba_distribution_from_latent(pi_latent, vf_latent, init_scale=0.01)

    self._value_fn = value_fn
    self._setup_init()

def step(self, obs, state=None, mask=None, deterministic=False):
    if deterministic:
        action, value, neglogp = self.sess.run([self.deterministic_action, self.value_flat, self.neglogp],
                                               {self.obs_ph: obs})
    else:
        action, value, neglogp = self.sess.run([self.action, self.value_flat, self.neglogp],
                                               {self.obs_ph: obs})
    return action, value, self.initial_state, neglogp

def proba_step(self, obs, state=None, mask=None):
    return self.sess.run(self.policy_proba, {self.obs_ph: obs})

def value(self, obs, state=None, mask=None):
    return self.sess.run(self.value_flat, {self.obs_ph: obs})

I'm getting the following error when control enters the env.reset() function:

InvalidArgumentError: You must feed a value for placeholder tensor 'model/batch_normalization_1/keras_learning_phase' with dtype bool

[[Node: model/batch_normalization_1/keras_learning_phase = Placeholder[dtype=DT_BOOL, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]

I tried a couple of things, like using Keras backend and setting

K.set_learning_phase(True)

before constructing the model, but that didn't seem to work either.

I ran the same code without the 2 BatchNorm layers and it worked. Is there any issue with the way I'm using BatchNorm?
I'm getting a similar error with Dropout too.

Any help will be greatly appreciated.

Thanks! :)

question

Most helpful comment

Hello,

before answering, I would really recommend you learning more about the objects you are using, notably tensorflow and batch norm.

InvalidArgumentError: You must feed a value for placeholder tensor 'model/batch_normalization_1/keras_learning_phase' with dtype bool

So, the error is pretty explicit, in the tensorflow graph, there is a placeholder "model/batch_normalization_1/keras_learning_phase" that is a waiting for value so other operations in the graph (the batch norm operation) can be computed.
That value needs to be fed when doing the sess.run() call, which is not the case here.
Because we are using tensorflow under the hood, and not keras, doing K.set_learning_phase(True) won't change anything. Keras is not used for training, so it cannot feed the input.

I ran the same code without the 2 BatchNorm layers and it worked

About the batch norm now. If you read about the batch norm, then I don't think it makes sense in a RL setting.
In a supervised learning setting, you will collect statistics (moving average of mean and std) during training and use those during testing.

In a RL setting, before training, you need to collect samples, but usually you have to collect them one by one.
However, a minibatch of size one has a variance of zero (and that's a problem)
So you can choose to collect samples in "test" mode with random statistics (or fixed one) and then after the first gradient step (where you switch to "train" mode), you need to switch back to "test" mode again.
This seems tricky, so I would definitely avoid that.
In fact, if you look at deeper architecture for RL (see #367 ), no batch norm is used.

All 2 comments

Hello,

before answering, I would really recommend you learning more about the objects you are using, notably tensorflow and batch norm.

InvalidArgumentError: You must feed a value for placeholder tensor 'model/batch_normalization_1/keras_learning_phase' with dtype bool

So, the error is pretty explicit, in the tensorflow graph, there is a placeholder "model/batch_normalization_1/keras_learning_phase" that is a waiting for value so other operations in the graph (the batch norm operation) can be computed.
That value needs to be fed when doing the sess.run() call, which is not the case here.
Because we are using tensorflow under the hood, and not keras, doing K.set_learning_phase(True) won't change anything. Keras is not used for training, so it cannot feed the input.

I ran the same code without the 2 BatchNorm layers and it worked

About the batch norm now. If you read about the batch norm, then I don't think it makes sense in a RL setting.
In a supervised learning setting, you will collect statistics (moving average of mean and std) during training and use those during testing.

In a RL setting, before training, you need to collect samples, but usually you have to collect them one by one.
However, a minibatch of size one has a variance of zero (and that's a problem)
So you can choose to collect samples in "test" mode with random statistics (or fixed one) and then after the first gradient step (where you switch to "train" mode), you need to switch back to "test" mode again.
This seems tricky, so I would definitely avoid that.
In fact, if you look at deeper architecture for RL (see #367 ), no batch norm is used.

Hi,

Thanks for the answer!

My initial model without the Batch Norm was overfitting. So I was reading this paper on Generalization techniques in RL (https://arxiv.org/pdf/1812.02341.pdf), and in Section 5.3, they have mentioned that Batch Norm offers a significant boost in performance. Hence, I was looking to use it.

Was this page helpful?
0 / 5 - 0 ratings