Transformers: the output type of TFBertModel is weird

Created on 26 Nov 2019  路  7Comments  路  Source: huggingface/transformers

model = TFBertModel.from_pretrained('bert-base-chinese')
model.summary()

optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])

predictions = model.predict(validation_input_ids)
print(type(predictions))
print(predictions.shape)   
<class 'list'>
AttributeError: 'list' object has no attribute 'shape'



md5-cda408119c8a1b245fcf7bd310c09f09



predictions = predictions[0]
print(predictions.shape)



md5-a9c70f11e3bac8bfbe292449a968cc00



(8359, 512, 768)
wontfix

Most helpful comment

This is because in a bert pretraining progress, there are two tasks: masked token prediction and next sentence predition . The first needs hidden state of each tokens ( shape: [batch_size, sequence_length, hidden_size]) the second needs the embedding of the whole sequence (shape : [batch_size, hidden_size] ) .

And there is also position left for some one who want to get all the hidden state from each level inside the model ( may represent different level of abstraction besides the last one ) or the attention matrix.

All 7 comments

This is because in a bert pretraining progress, there are two tasks: masked token prediction and next sentence predition . The first needs hidden state of each tokens ( shape: [batch_size, sequence_length, hidden_size]) the second needs the embedding of the whole sequence (shape : [batch_size, hidden_size] ) .

And there is also position left for some one who want to get all the hidden state from each level inside the model ( may represent different level of abstraction besides the last one ) or the attention matrix.

This is because in a bert pretraining progress, there are two tasks: masked token prediction and next sentence predition . The first needs hidden state of each tokens ( shape: [batch_size, sequence_length, hidden_size]) the second needs the embedding of the whole sequence (shape : [batch_size, hidden_size] ) .

Because of this
if I want use tf.keras to custom the layer below TFBertModel
I have to add this particular line
bert = bert[0]

input_layer = Input(shape = (512,), dtype='int64')  
bert = TFBertModel.from_pretrained('bert-base-chinese')(input_layer)

bert = bert[0]               # I have to add this particular line

dropout = Dropout(0.1)(bert)
flat = Flatten()(dropout)
classifier = Dense(units=5)(flat)                  
model = Model(inputs=input_layer, outputs=classifier)
model.summary()
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 512)]             0         
_________________________________________________________________
tf_bert_model (TFBertModel)  ((None, 512, 768), (None, 102267648 
_________________________________________________________________
dropout_37 (Dropout)         (None, 512, 768)          0         
_________________________________________________________________
flatten (Flatten)            (None, 393216)            0         
_________________________________________________________________
dense (Dense)                (None, 5)                 1966085   
=================================================================
Total params: 104,233,733
Trainable params: 104,233,733
Non-trainable params: 0

This is because in a bert pretraining progress, there are two tasks: masked token prediction and next sentence predition . The first needs hidden state of each tokens ( shape: [batch_size, sequence_length, hidden_size]) the second needs the embedding of the whole sequence (shape : [batch_size, hidden_size] ) .

Because of this
if I want use tf.keras to custom the layer below TFBertModel
I have to add this particular line
bert = bert[0]

input_layer = Input(shape = (512,), dtype='int64')  
bert = TFBertModel.from_pretrained('bert-base-chinese')(input_layer)

bert = bert[0]               # I have to add this particular line

dropout = Dropout(0.1)(bert)
flat = Flatten()(dropout)
classifier = Dense(units=5)(flat)                  
model = Model(inputs=input_layer, outputs=classifier)
model.summary()
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 512)]             0         
_________________________________________________________________
tf_bert_model (TFBertModel)  ((None, 512, 768), (None, 102267648 
_________________________________________________________________
dropout_37 (Dropout)         (None, 512, 768)          0         
_________________________________________________________________
flatten (Flatten)            (None, 393216)            0         
_________________________________________________________________
dense (Dense)                (None, 5)                 1966085   
=================================================================
Total params: 104,233,733
Trainable params: 104,233,733
Non-trainable params: 0

That's right. But for sentence level classification , I recommend you to use the embedding of whole sequence .

bert = bert[1] # instead of  bert = bert[0]   

Just like what the official sequence classificiation does in TFBertForSequenceClassification class at
https://github.com/huggingface/transformers/blob/master/transformers/modeling_tf_bert.py

bert = bert[1] # instead of  bert = bert[0]   

May I ask why?
It looks like it reduce the features of flatten layer.
It doesn't look like whole.

input_layer = Input(shape = (512,), dtype='int64')  
bert = TFBertModel.from_pretrained('bert-base-chinese')(input_layer)
bert = bert[1]    
dropout = Dropout(0.1)(bert)
flat = Flatten()(dropout)
classifier = Dense(units=5)(flat)                  
model = Model(inputs=input_layer, outputs=classifier)
model.summary()
Model: "model_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         [(None, 512)]             0         
_________________________________________________________________
tf_bert_model_5 (TFBertModel ((None, 512, 768), (None, 102267648 
_________________________________________________________________
dropout_225 (Dropout)        (None, 768)               0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 768)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 5)                 3845      
=================================================================
Total params: 102,271,493
Trainable params: 102,271,493
Non-trainable params: 0

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

It is still a problem

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Was this page helpful?
0 / 5 - 0 ratings