Tensor2tensor: help how to just use the transformer encoder layer?

Created on 17 May 2018 · 13Comments · Source: tensorflow/tensor2tensor

I just want to use the transformer encoder.
While the tensor2tensor framework is too complex. I spend almost two days from beginner to give up.
And I tried the third implement like https://www.github.com/kyubyong/transformer while it may have bug my model cannot converge and even worse than simple average embedding model.

Will you please provide the encoder in Tensorflow just as a function call? No need tpu/ctpu support. Thanks.

Source

doubler

👍2

Most helpful comment

Hi @doubler, we support library usage:

from tensor2tensor.models import transformer
import tensorflow as tf

hparams = transformer.transformer_base()
encoder = transformer.TransformerEncoder(hparams, mode=tf.estimator.ModeKeys.TRAIN)
x = <your inputs, which should be of shape [batch_size, timesteps, 1, hparams.hidden_dim]>
y = encoder({"inputs": x})

rsepassi on 20 May 2018

👍6 😄2 👎1

All 13 comments

I think T2T is modular enough and you can use just the Transformer encoder and use in a different TF project.
If you want a more easy-to-understand (but still functional) Transformer implementation, see e.g. NeuralMonkey or OpenNMT-py.

martinpopel on 17 May 2018

Thanks @martinpopel
I understand the implementation. While I want a pure transformer.py just rely tensorflow. Like OpenNMT-py just use pytorch.
The NeuralMonkey imported too much other modules and the OpenNMT-py is not a tensorflow lib.
So it's still not easy to use. :(

doubler on 18 May 2018

Hi @doubler, we support library usage:

from tensor2tensor.models import transformer
import tensorflow as tf

hparams = transformer.transformer_base()
encoder = transformer.TransformerEncoder(hparams, mode=tf.estimator.ModeKeys.TRAIN)
x = <your inputs, which should be of shape [batch_size, timesteps, 1, hparams.hidden_dim]>
y = encoder({"inputs": x})

rsepassi on 20 May 2018

👍6 😄2 👎1

@rsepassi your code seems like what I need. while miss the "targets", and throw an error

File "train.py", line 48, in transformerEncoder
enc = encoder({'inputs': enc})
File "/usr/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 717, in __call__
outputs = self.call(inputs, args, *kwargs)
File "/usr/lib/python2.7/site-packages/tensor2tensor/utils/t2t_model.py", line 154, in call
sharded_logits, losses = self.model_fn_sharded(sharded_features)
File "/usr/lib/python2.7/site-packages/tensor2tensor/utils/t2t_model.py", line 209, in model_fn_sharded
sharded_logits, sharded_losses = dp(self.model_fn, datashard_to_features)
File "/usr/lib/python2.7/site-packages/tensor2tensor/utils/expert_utils.py", line 230, in __call__
outputs.append(fnsi)
File "/usr/lib/python2.7/site-packages/tensor2tensor/utils/t2t_model.py", line 253, in model_fn
losses["training"] = self.loss(logits, features)
File "/usr/lib/python2.7/site-packages/tensor2tensor/utils/t2t_model.py", line 412, in loss
return self._loss_single(logits, target_modality, features["targets"])
KeyError: 'targets'

doubler on 21 May 2018

👍1

I have the same problem as @doubler , why does the encoder need to know targets?

marhlder on 1 Jun 2018

Hi @doubler and @marhlder , do you have any solution for the problem of missing the "targets"?

daiquocnguyen on 7 Dec 2018

@daiquocnguyen
I've just set it like this:
encoder({"inputs": x, "targets": 0, "target_space_id": 0})
Works for me

marhlder on 7 Dec 2018

Thank you very much @marhlder

daiquocnguyen on 8 Dec 2018

Hi @doubler and @marhlder , do you have any solution for the problem of missing the "targets"?

I didn't use the tensor2tensor. The gpt model code https://github.com/openai/finetune-transformer-lm is very clear and easy to use.

doubler on 9 Dec 2018

👍2

So when I run the code in Keras as below, the model works:

def transformer_code(inputLayer):
    hparams = transformer.transformer_base()
    encoder = transformer.TransformerEncoder(hparams, mode=tf.estimator.ModeKeys.TRAIN)
    x = keras.backend.expand_dims(inputLayer, axis=2)
    y = encoder({"inputs": x, "targets": 0, "target_space_id": 0})
    y = keras.backend.squeeze(y[0], 2)
    return y


def trainModel(args, trainInput, trianOutput, testInput, testOutput, taskName, tags):

    inputLayer = keras.layers.Input(shape=(len(trainInput[0]),
                len(trainInput[0][0])), dtype='float32')

    inputAfterDense = keras.layers.Dense(512, activation='relu')(inputLayer)

    crfLayer = CRF(len(tags), sparse_target=True, name='result')

    y = keras.layers.Lambda(transformer_code)(inputAfterDense )
    modelPred = crfLayer(y)

    model = keras.Model(inputs=inputLayer, outputs=modelPred)
    model.compile(
        optimizer='adam',
        loss = {'result': crfLayer.loss_function},
        metrics={'result': crfLayer.accuracy}
        )
    print 'finish model setting'
    print model.summary()

but if I remove the Dense layer inputAfterDense = keras.layers.Dense(512, activation='relu')(inputLayer) the training breaks and prediction accuracy is nearly zero all the time.

Why is that?

Eugen2525 on 4 Feb 2019

I’m not a Keras expert, but are you sure the Lambda layer captures
variables created in the function?
It may be that the Dense layer’s variables are the only ones updating and
that’s why if you remove it, accuracy is 0.

On Mon, Feb 4, 2019 at 2:00 AM Eugen notifications@github.com wrote:

So when I run the code in Keras as below, the model works:

def transformer_code(inputLayer):
hparams = transformer.transformer_base()

hparams.hidden_size=inputLayer.shape[-1].value

hparams.num_heads=7

encoder = transformer.TransformerEncoder(hparams,
mode=tf.estimator.ModeKeys.TRAIN)
x = keras.backend.expand_dims(inputLayer, axis=2)
y = encoder({"inputs": x, "targets": 0, "target_space_id": 0})
y = keras.backend.squeeze(y[0], 2)
return y

def trainModel(args, trainInput, trianOutput, testInput, testOutput,
taskName, tags):
inputLayer = keras.layers.Input(shape=(len(trainInput[0]),
len(trainInput[0][0])), dtype='float32')

inputAfterDense = keras.layers.Dense(512, activation='relu')(inputLayer)

crfLayer = CRF(len(tags), sparse_target=True, name='result')

y = keras.layers.Lambda(transformer_code)(inputAfterDense )
modelPred = crfLayer(y)

model = keras.Model(inputs=inputLayer, outputs=modelPred)
model.compile(
optimizer='adam',
loss = {'result': crfLayer.loss_function},
metrics={'result': crfLayer.accuracy}
)
print 'finish model setting'
print model.summary()

but if I remove the Dense layer 'inputAfterDense = keras.layers.Dense(512,
activation='relu')(inputLayer)' the training breaks and prediction accuracy
is nearly zero all the time.

Why is that?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensor2tensor/issues/813#issuecomment-460189815,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABEGW4WtalAehANI8gxL1VLkQk2XcvNtks5vKASlgaJpZM4UCsge
.

rsepassi on 4 Feb 2019

I’m not a Keras expert, but are you sure the Lambda layer captures variables created in the function? It may be that the Dense layer’s variables are the only ones updating and that’s why if you remove it, accuracy is 0.
…
On Mon, Feb 4, 2019 at 2:00 AM Eugen @.*> wrote: So when I run the code in Keras as below, the model works: def transformer_code(inputLayer): hparams = transformer.transformer_base() #hparams.hidden_size=inputLayer.shape[-1].value #hparams.num_heads=7 encoder = transformer.TransformerEncoder(hparams, mode=tf.estimator.ModeKeys.TRAIN) x = keras.backend.expand_dims(inputLayer, axis=2) y = encoder({"inputs": x, "targets": 0, "target_space_id": 0}) y = keras.backend.squeeze(y[0], 2) return y def trainModel(args, trainInput, trianOutput, testInput, testOutput, taskName, tags): inputLayer = keras.layers.Input(shape=(len(trainInput[0]), len(trainInput[0][0])), dtype='float32') inputAfterDense = keras.layers.Dense(512, activation='relu')(inputLayer) crfLayer = CRF(len(tags), sparse_target=True, name='result') y = keras.layers.Lambda(transformer_code)(inputAfterDense ) modelPred = crfLayer(y) model = keras.Model(inputs=inputLayer, outputs=modelPred) model.compile( optimizer='adam', loss = {'result': crfLayer.loss_function}, metrics={'result': crfLayer.accuracy} ) print 'finish model setting' print model.summary() but if I remove the Dense layer 'inputAfterDense = keras.layers.Dense(512, activation='relu')(inputLayer)' the training breaks and prediction accuracy is nearly zero all the time. Why is that? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#813 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABEGW4WtalAehANI8gxL1VLkQk2XcvNtks5vKASlgaJpZM4UCsge .

So I checked as you said and removed Dense layer altogether. The lambda layers get captured from what I can say because if I just pass the input without dense and transformer, I still have accuracy around 0.30 for some epochs, while if I switch on transformer but switch off dense layer, I get the reported 0.001... accuracy.
Interestengly I am getting below warning from tf:

:The default implementation of loss requires that the model be used with a Problem. If using a Problem, augment the hparams object with trainer_lib.add_problem_hparams. If not, override loss.

I am using custom number of heads of 7 and hidden_size of 196

What gives?

Eugen2525 on 4 Feb 2019

I am a newbie into the transformer and I wanted to replace the transformer with LSTM which I used it recently, any insight would be appreciated.
The following is my network and the input shape is (batch_size,time_step,no_feature): but I have the target with shape of (batch_size,1) meaning 1 label for each 100 time_step,

adam = optimizers.Adam(lr=0.00075,decay=1e-6) 
input1 = keras.layers.Input(shape=(X_tr.shape[1], 25),name='inp1')      
input2 = keras.layers.Input(shape=(X_tr.shape[1], 11),name='inp2') 
x2 = Embedding(40, 5,name='emb',mask_zero=True)(input2)  
x2 = Lambda(lambda x2: x2, output_shape=lambda s:s,name='lambdax2')(x2)
x2 = Reshape((int(x2.shape[1]),int(x2.shape[2]*x2.shape[3])),name='reshape1')(x2)
merge = keras.layers.Concatenate(axis=-1,name='concat')([input1, x2])
mask  = Masking(mask_value=0.,name="maski")(merge)
inputAfterDense = keras.layers.Dense(512, activation='relu',name='densetransformer')(mask)
transformer = keras.layers.Lambda(transformer_code,name='lambdatransformer')(inputAfterDense)
out = Dense(1,activation="sigmoid")(transformer) 
model = keras.models.Model(inputs=[input1, input2], outputs=out)
model.compile(loss='binary_crossentropy', optimizer=adam, metrics=[f1,sensitivity, specificity,'accuracy'])
model.summary()

history = model.fit_generator(train_gen,validation_data=val_gen,#class_weight=class_weight,
                              validation_steps=val_steps,steps_per_epoch=train_steps
                              ,epochs=1,verbose=1,shuffle=True,callbacks = callbacks_list)

but when I want to train the model It returns the following error:

ValueError: Error when checking target: expected dense_1 to have 3 dimensions, but got array with shape (127, 1)