model = TFBertModel.from_pretrained('bert-base-chinese')
model.summary()
optimizer = tf.keras.optimizers.Adam(learning_rate=3e-5, epsilon=1e-08, clipnorm=1.0)
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
predictions = model.predict(validation_input_ids)
print(type(predictions))
print(predictions.shape)
<class 'list'>
AttributeError: 'list' object has no attribute 'shape'
md5-cda408119c8a1b245fcf7bd310c09f09
predictions = predictions[0]
print(predictions.shape)
md5-a9c70f11e3bac8bfbe292449a968cc00
(8359, 512, 768)
This is because in a bert pretraining progress, there are two tasks: masked token prediction and next sentence predition . The first needs hidden state of each tokens ( shape: [batch_size, sequence_length, hidden_size]) the second needs the embedding of the whole sequence (shape : [batch_size, hidden_size] ) .
And there is also position left for some one who want to get all the hidden state from each level inside the model ( may represent different level of abstraction besides the last one ) or the attention matrix.
This is because in a bert pretraining progress, there are two tasks: masked token prediction and next sentence predition . The first needs hidden state of each tokens ( shape: [batch_size, sequence_length, hidden_size]) the second needs the embedding of the whole sequence (shape : [batch_size, hidden_size] ) .
Because of this
if I want use tf.keras to custom the layer below TFBertModel
I have to add this particular line
bert = bert[0]
input_layer = Input(shape = (512,), dtype='int64')
bert = TFBertModel.from_pretrained('bert-base-chinese')(input_layer)
bert = bert[0] # I have to add this particular line
dropout = Dropout(0.1)(bert)
flat = Flatten()(dropout)
classifier = Dense(units=5)(flat)
model = Model(inputs=input_layer, outputs=classifier)
model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 512)] 0
_________________________________________________________________
tf_bert_model (TFBertModel) ((None, 512, 768), (None, 102267648
_________________________________________________________________
dropout_37 (Dropout) (None, 512, 768) 0
_________________________________________________________________
flatten (Flatten) (None, 393216) 0
_________________________________________________________________
dense (Dense) (None, 5) 1966085
=================================================================
Total params: 104,233,733
Trainable params: 104,233,733
Non-trainable params: 0
This is because in a bert pretraining progress, there are two tasks: masked token prediction and next sentence predition . The first needs hidden state of each tokens ( shape: [batch_size, sequence_length, hidden_size]) the second needs the embedding of the whole sequence (shape : [batch_size, hidden_size] ) .
Because of this
if I want use tf.keras to custom the layer below TFBertModel
I have to add this particular line
bert = bert[0]input_layer = Input(shape = (512,), dtype='int64') bert = TFBertModel.from_pretrained('bert-base-chinese')(input_layer) bert = bert[0] # I have to add this particular line dropout = Dropout(0.1)(bert) flat = Flatten()(dropout) classifier = Dense(units=5)(flat) model = Model(inputs=input_layer, outputs=classifier) model.summary()Model: "model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_1 (InputLayer) [(None, 512)] 0 _________________________________________________________________ tf_bert_model (TFBertModel) ((None, 512, 768), (None, 102267648 _________________________________________________________________ dropout_37 (Dropout) (None, 512, 768) 0 _________________________________________________________________ flatten (Flatten) (None, 393216) 0 _________________________________________________________________ dense (Dense) (None, 5) 1966085 ================================================================= Total params: 104,233,733 Trainable params: 104,233,733 Non-trainable params: 0
That's right. But for sentence level classification , I recommend you to use the embedding of whole sequence .
bert = bert[1] # instead of bert = bert[0]
Just like what the official sequence classificiation does in TFBertForSequenceClassification class at
https://github.com/huggingface/transformers/blob/master/transformers/modeling_tf_bert.py
bert = bert[1] # instead of bert = bert[0]
May I ask why?
It looks like it reduce the features of flatten layer.
It doesn't look like whole.
input_layer = Input(shape = (512,), dtype='int64')
bert = TFBertModel.from_pretrained('bert-base-chinese')(input_layer)
bert = bert[1]
dropout = Dropout(0.1)(bert)
flat = Flatten()(dropout)
classifier = Dense(units=5)(flat)
model = Model(inputs=input_layer, outputs=classifier)
model.summary()
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_6 (InputLayer) [(None, 512)] 0
_________________________________________________________________
tf_bert_model_5 (TFBertModel ((None, 512, 768), (None, 102267648
_________________________________________________________________
dropout_225 (Dropout) (None, 768) 0
_________________________________________________________________
flatten_3 (Flatten) (None, 768) 0
_________________________________________________________________
dense_5 (Dense) (None, 5) 3845
=================================================================
Total params: 102,271,493
Trainable params: 102,271,493
Non-trainable params: 0
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
It is still a problem
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Most helpful comment
This is because in a bert pretraining progress, there are two tasks: masked token prediction and next sentence predition . The first needs hidden state of each tokens ( shape: [batch_size, sequence_length, hidden_size]) the second needs the embedding of the whole sequence (shape : [batch_size, hidden_size] ) .
And there is also position left for some one who want to get all the hidden state from each level inside the model ( may represent different level of abstraction besides the last one ) or the attention matrix.