Addons: SigmoidFocalCrossEntropy gives ValueError

Created on 5 Mar 2020 · 7Comments · Source: tensorflow/addons

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Colab
TensorFlow version and how it was installed (source or binary): '2.1.0' - colab default idk
TensorFlow-Addons version and how it was installed (source or binary): '0.8.2' - colab default idk
Python version: 3.6.9
Is GPU used? (yes/no): yes

Code to reproduce the issue

model = tf.keras.Sequential([
    tf.keras.layers.Dense(64, input_shape=(784,), activation='relu', name='dense_1'),
    tf.keras.layers.Dense(64, activation='relu', name='dense_2'),
    tf.keras.layers.Dense(10, activation='softmax', name='predictions'),
])
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape(-1, 784).astype('float32') / 255
x_test = x_test.reshape(-1, 784).astype('float32') / 255
model.compile(
    optimizer='adam',  # Utilize TFA optimizer
    loss=tfa.losses.SigmoidFocalCrossEntropy(),
    metrics=['accuracy'])
model.fit(
    x_train,
    y_train,
    batch_size=64,
    epochs=10)

Other info / logs

Train on 60000 samples
Epoch 1/10
   64/60000 [..............................] - ETA: 44s
\---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-8-54bd6d7a40f5> in <module>()
      9     y_train,
     10     batch_size=64,
---> 11     epochs=10)

29 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py in wrapper(*args, **kwargs)
    966           except Exception as e:  # pylint:disable=broad-except
    967             if hasattr(e, "ag_error_metadata"):
--> 968               raise e.ag_error_metadata.to_exception(e)
    969             else:
    970               raise

ValueError: in converted code:

    /usr/local/lib/python3.6/dist-packages/tensorflow_addons/losses/focal_loss.py:123 sigmoid_focal_crossentropy  *
        y_true = tf.convert_to_tensor(y_true, dtype=y_pred.dtype)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1256 convert_to_tensor_v2
        as_ref=False)
    /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1290 convert_to_tensor
        (dtype.name, value.dtype.name, value))

    ValueError: Tensor conversion requested dtype float32 for Tensor with dtype uint8: <tf.Tensor 'y_true:0' shape=(None, 1) dtype=uint8>

losses

Source

All 7 comments

There are two problems here:

You are passing your ground truth (y_train, y_test) as integer. Those should be casted to float32, first thing you should do is this:

y_train = tf.keras.utils.to_categorical(y_train, num_classes=10).astype(np.float32)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10).astype(np.float32)

sparse labels aren't supported as of now. I agree that we should provide this information in a better way, either in docs or in the code usage.

Here is a sample colab I created for your reference.
https://colab.research.google.com/drive/1Fekfd7AZF_lSPBC9L-IMxVJmohNfA6wP

I am closing this issue for now. If you encounter any other problem, feel free to reopen it.

AakashKumarNain on 6 Mar 2020

👍1

Hey @AakashKumarNain, thanks for answer!
If just accept the float32, why shouldn't add the basic type casting before the function? Doesn't this give the user more comfort?
And other question is why is this restricted with float32?

us on 6 Mar 2020

It can use any float data type. Also, it takes care of automatic conversion but I need to look why it failed at that point

AakashKumarNain on 7 Mar 2020

👍1

@AakashKumarNain please check it. What is the problem?

(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()

train_images, test_images = train_images / 255.0, test_images / 255.0
train_labels, test_labels = train_labels.astype('float32'), test_labels.astype('float32')

model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.3))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))

model.compile(optimizer='adam',
              loss=tfa.losses.SigmoidFocalCrossEntropy(reduction=tf.keras.losses.Reduction.SUM_OVER_BATCH_SIZE),
              metrics=['accuracy'])


history = model.fit(train_images, train_labels, epochs=50, 
                    validation_data=(test_images, test_labels))```

Output is:

Train on 50000 samples, validate on 10000 samples
Epoch 1/50
50000/50000 [==============================] - 10s 192us/sample - loss: nan - accuracy: 6.0000e-05 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 2/50
50000/50000 [==============================] - 7s 148us/sample - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 3/50
50000/50000 [==============================] - 7s 146us/sample - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 4/50
50000/50000 [==============================] - 7s 144us/sample - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00
Epoch 5/50
50000/50000 [==============================] - 7s 145us/sample - loss: nan - accuracy: 0.0000e+00 - val_loss: nan - val_accuracy: 0.0000e+00

us on 25 Mar 2020

There are some bugs. Check #1261 I will look into it but can't do it immediately

AakashKumarNain on 25 Mar 2020

👍1

Isn't model.fit's class_weight the same as focal loss? What I gather from the documentation, both are used to weight the classes in the loss function. Or is there any difference?

sandorvasas on 1 Aug 2020

Both are used to reweigh the classes but differ in their method. Focal loss reweighs based on loss (harder/easier to label) whereas class_weight does it based on the inverse of class frequency. You might get similar reweighing results as the minority class will presumably be harder to label and hence will get weighted high.

I am yet to figure out a true way to combine the non-balanced focal loss (alpha and gamma) for multi-class situations. Currently in TF's version of FL, alpha (re-weighing factor) is a float - which only makes sense for a binary classification.