System information
Code to reproduce the issue
model = tf.keras.Sequential([
tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255
model.compile(
optimizer='adam', # Utilize TFA optimizer
loss=tfa.losses.SigmoidFocalCrossEntropy(),
metrics=['accuracy'])
model.fit(
x_train,
y_train,
batch_size=64,
epochs=10)
Other info / logs
Train on 60000 samples
Epoch 1/10
64/60000 [..............................] - ETA: 44s
\---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-8-54bd6d7a40f5> in <module>()
9 y_train,
10 batch_size=64,
---> 11 epochs=10)
29 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py in wrapper(*args, **kwargs)
966 except Exception as e: # pylint:disable=broad-except
967 if hasattr(e, "ag_error_metadata"):
--> 968 raise e.ag_error_metadata.to_exception(e)
969 else:
970 raise
ValueError: in converted code:
/usr/local/lib/python3.6/dist-packages/tensorflow_addons/losses/focal_loss.py:123 sigmoid_focal_crossentropy *
y_true = tf.convert_to_tensor(y_true, dtype=y_pred.dtype)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1256 convert_to_tensor_v2
as_ref=False)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1290 convert_to_tensor
(dtype.name, value.dtype.name, value))
ValueError: Tensor conversion requested dtype float32 for Tensor with dtype uint8: <tf.Tensor 'y_true:0' shape=(None, 1) dtype=uint8>
There are two problems here:
(y_train, y_test) as integer. Those should be casted to float32, first thing you should do is this:y_train = tf.keras.utils.to_categorical(y_train, num_classes=10).astype(np.float32)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10).astype(np.float32)
sparse labels aren't supported as of now. I agree that we should provide this information in a better way, either in docs or in the code usage. Here is a sample colab I created for your reference.
https://colab.research.google.com/drive/1Fekfd7AZF_lSPBC9L-IMxVJmohNfA6wP
I am closing this issue for now. If you encounter any other problem, feel free to reopen it.
Hey @AakashKumarNain, thanks for answer!
If just accept the float32, why shouldn't add the basic type casting before the function? Doesn't this give the user more comfort?
And other question is why is this restricted with float32?
It can use any float data type. Also, it takes care of automatic conversion but I need to look why it failed at that point
@AakashKumarNain please check it. What is the problem?
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0
train_labels, test_labels = train_labels.astype('float32'), test_labels.astype('float32')
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.3))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.compile(optimizer='adam',
loss=tfa.losses.SigmoidFocalCrossEntropy(reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=50,
validation_data=(test_images, test_labels))```
Output is:
Train on 50000 samples, validate on 10000 samples
Epoch 1/50
50000/50000 [==============================] - 10s 192us/sample - loss: nan - accuracy: 6.0000e-05 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/50
50000/50000 [==============================] - 7s 148us/sample - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/50
50000/50000 [==============================] - 7s 146us/sample - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 4/50
50000/50000 [==============================] - 7s 144us/sample - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 5/50
50000/50000 [==============================] - 7s 145us/sample - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
There are some bugs. Check #1261 I will look into it but can't do it immediately
Isn't model.fit's class_weight the same as focal loss? What I gather from the documentation, both are used to weight the classes in the loss function. Or is there any difference?
Both are used to reweigh the classes but differ in their method. Focal loss reweighs based on loss (harder/easier to label) whereas class_weight does it based on the inverse of class frequency. You might get similar reweighing results as the minority class will presumably be harder to label and hence will get weighted high.
I am yet to figure out a true way to combine the non-balanced focal loss (alpha and gamma) for multi-class situations. Currently in TF's version of FL, alpha (re-weighing factor) is a float - which only makes sense for a binary classification.