Describe the feature and the current behavior/state.
Currently, F1-score cannot be meaningfully used as a metric in keras neural network models, because keras will call F1-score at each batch step at validation, which results in too small values. Therefore, F1-score was removed from keras, see https://github.com/keras-team/keras/issues/5794, where also some quick solution is proposed. Tfa's F1-score exhibits exactly the same problem when used with keras.
Relevant information
Which API type would this fall under (layer, metric, optimizer, etc.)
metric
Who will benefit with this feature?
keras users
Any other info.
Here is some code showing the problem. fbeta_score is 0.6649 in the last epoch, although prediction is 100% accurate. The code needs tfa 0.7.0 with the threshold feature for F1-score.
import numpy as np
import random
import tensorflow_addons as tfa
from keras import models
from keras import layers
threshold = 0.9
random.seed(1)
train_labels = np.array([random.randint(0,1) for iter in range(150000)])
train_data = np.array([[x,0] for x in train_labels])
network = models.Sequential()
network.add(layers.Dense(1, activation='sigmoid', input_shape=(train_data.shape[1],)))
network.compile(optimizer='rmsprop',loss='mae',metrics=['acc',tfa.metrics.FBetaScore(num_classes=2, average="micro", threshold = threshold)])
network.fit(train_data, train_labels, epochs=10, batch_size=128)
predicted_data = network.predict(train_data)
predicted_data_boolean = np.array([0 if x < threshold else 1 for x in predicted_data])
print(all(predicted_data_boolean == train_labels))
Here is the output, exhibiting a too low F1 score (it should be 1.0, because predicted labels are equal to training labels):
Epoch 1/10
150000/150000 [==============================] - 2s 11us/step - loss: 0.2835 - acc: 1.0000 - fbeta_score: 0.0000e+00
Epoch 2/10
150000/150000 [==============================] - 2s 10us/step - loss: 0.1757 - acc: 1.0000 - fbeta_score: 0.0000e+00
Epoch 3/10
150000/150000 [==============================] - 1s 10us/step - loss: 0.1063 - acc: 1.0000 - fbeta_score: 0.0848
Epoch 4/10
150000/150000 [==============================] - 1s 10us/step - loss: 0.0624 - acc: 1.0000 - fbeta_score: 0.4408
Epoch 5/10
150000/150000 [==============================] - 1s 10us/step - loss: 0.0359 - acc: 1.0000 - fbeta_score: 0.6159
Epoch 6/10
150000/150000 [==============================] - 1s 10us/step - loss: 0.0204 - acc: 1.0000 - fbeta_score: 0.7071
Epoch 7/10
150000/150000 [==============================] - 1s 10us/step - loss: 0.0115 - acc: 1.0000 - fbeta_score: 0.7632
Epoch 8/10
150000/150000 [==============================] - 2s 11us/step - loss: 0.0065 - acc: 1.0000 - fbeta_score: 0.8012
Epoch 9/10
150000/150000 [==============================] - 2s 13us/step - loss: 0.0036 - acc: 1.0000 - fbeta_score: 0.8286
Epoch 10/10
150000/150000 [==============================] - 2s 12us/step - loss: 0.0020 - acc: 1.0000 - fbeta_score: 0.8495
True
I just found here that there is a way of directly computing precision, recall and related metrics (but not F1 score, it seems) in keras, without running into the mentioned batch problem, with:
METRICS = [
keras.metrics.TruePositives(name='tp'),
keras.metrics.FalsePositives(name='fp'),
keras.metrics.TrueNegatives(name='tn'),
keras.metrics.FalseNegatives(name='fn'),
keras.metrics.BinaryAccuracy(name='accuracy'),
keras.metrics.Precision(name='precision'),
keras.metrics.Recall(name='recall'),
keras.metrics.AUC(name='auc'),
]
model.compile(...,metrics=METRICS)
Hi,
Thanks for opening this issue! It looks like there are some global metrics that the Keras team removed starting Keras 2.0.0 because those global metrics do not provide good info when approximated batch-wise. I'll take a look at the callback workaround linked and help to contribute when I have time :)
It seems that keras.metrics.Precision(name='precision') and keras.metrics.Recall(name='recall') already solve the batch problem. So Keras would only need to add the obvious F1 computation from these values.
There is a F1 Metric implementation for Keras here:
https://github.com/tensorflow/addons/blob/master/tensorflow_addons/metrics/f_scores.py
This issue should be closed imo.
There is a F1 Metric implementation for Keras here:
https://github.com/tensorflow/addons/blob/master/tensorflow_addons/metrics/f_scores.pyThis issue should be closed imo.
This metric suffers from the batch problem, as demonstrated by my code above.
@tillmo Well, then I should bring the code back to my small tool lib...
@all: It's really a shame that we (the addons, the keras and the tensorflow team) do not manage to implement a proper f1 function. This is so basic that I would refuse to call any tool to be complete without it. Sorry for these self critical words.
@Saishruthi and @squadrick what do you think about this?
@PhilipMay I think you have implemented a proper f1 function. It just does not interact well with Keras. And maybe the place to have an f1 function that interacts well with Keras is Keras, and not tfa. After all, Keras already provides precision and recall, so f1 cannot be a big step.
@tillmo Well, then I should bring the code back to my small tool lib...
I changed my old f1 code to tf.keras. It is back and usable now.
@PhilipMay are there any issues you see with adding your implementation into Addons? We have precedent for function specific imports:
https://github.com/tensorflow/addons/blob/master/tensorflow_addons/callbacks/tqdm_progress_bar.py#L68
And I would prefer a working implementation with external dependencies vs. a buggy one. If there are no other issues would you be willing to submit a PR?
Sure. Will do that the next days.
Thank you @PhilipMay for working on this. Please feel free to send a PR to the tensorflow repo directly and skip the migration step since this is a metric we want in the main repo.
@pavithrasv I will do that. Although I am pretty sure that my implementation will need futher discussion and finetuning.
@pavithrasv, @seanpmorgan and @karmel : started a discussion about the implementation here at TF repo: https://github.com/tensorflow/tensorflow/issues/36799
@PhilipMay I've been busy and couldn't sync up with this thread in a while. I'm following the discussion. Thanks for taking the time to do this.
Ok so I took a closer look at the script demonstrating the bug. I believe there are two small mistakes:
1) The code snippet uses multi-backend keras instead of tf.keras. TF addons classes were never intended to be used with multi-backend keras. TF addons subclasses a tf.keras.metrics.Metric object, but keras expects a keras.metrics.Metric object. Overall, I don't even know how this doesn't actually throw an error, but well, I guess it's multi-backend keras' fault for not raising an error here. For future readers: don't use multi-backend keras. It's deprecated.
2) The threashold for the Fbeta score is set to 0.9, while by default, the computed keras accuracy uses a threashold of 0.5, which explains the other discrepency between the accuracy numbers and the Fbeta.
Here is the version of the script with the two issues fixed:
import numpy as np
import random
import tensorflow_addons as tfa
from tensorflow.keras import models
from tensorflow.keras import layers
threshold = 0.5
random.seed(1)
train_labels = np.array([random.randint(0,1) for iter in range(150000)])
train_data = np.array([[x,0] for x in train_labels])
network = models.Sequential()
network.add(layers.Dense(1, activation='sigmoid', input_shape=(train_data.shape[1],)))
network.compile(optimizer='rmsprop',loss='mae',metrics=['acc',tfa.metrics.FBetaScore(num_classes=2, average="micro", threshold=threshold )])
network.fit(train_data, train_labels, epochs=10, batch_size=128)
predicted_data = network.predict(train_data)
predicted_data_boolean = np.array([0 if x < threshold else 1 for x in predicted_data])
print(all(predicted_data_boolean == train_labels))
Here is the output:
2020-03-15 20:34:44.321274: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-03-15 20:34:44.325879: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3499995000 Hz
2020-03-15 20:34:44.326328: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55fa6a91c8b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-03-15 20:34:44.326373: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
Train on 150000 samples
Epoch 1/10
150000/150000 [==============================] - 1s 9us/sample - loss: 0.4313 - acc: 0.8722 - fbeta_score: 0.8848
Epoch 2/10
150000/150000 [==============================] - 1s 8us/sample - loss: 0.2987 - acc: 1.0000 - fbeta_score: 1.0000
Epoch 3/10
150000/150000 [==============================] - 1s 8us/sample - loss: 0.1922 - acc: 1.0000 - fbeta_score: 1.0000
Epoch 4/10
150000/150000 [==============================] - 1s 8us/sample - loss: 0.1175 - acc: 1.0000 - fbeta_score: 1.0000
Epoch 5/10
150000/150000 [==============================] - 1s 8us/sample - loss: 0.0694 - acc: 1.0000 - fbeta_score: 1.0000
Epoch 6/10
150000/150000 [==============================] - 1s 7us/sample - loss: 0.0400 - acc: 1.0000 - fbeta_score: 1.0000
Epoch 7/10
150000/150000 [==============================] - 1s 8us/sample - loss: 0.0228 - acc: 1.0000 - fbeta_score: 1.0000
Epoch 8/10
150000/150000 [==============================] - 1s 8us/sample - loss: 0.0129 - acc: 1.0000 - fbeta_score: 1.0000
Epoch 9/10
150000/150000 [==============================] - 1s 7us/sample - loss: 0.0072 - acc: 1.0000 - fbeta_score: 1.0000
Epoch 10/10
150000/150000 [==============================] - 1s 7us/sample - loss: 0.0041 - acc: 1.0000 - fbeta_score: 1.0000
True
I believe the error we made here is not realizing that @tillmo was talking about multi-backend keras in all his messages (I just realized now). So to answer your question @tillmo:
keras-team/keras repo will soon be overwritten with the code of tf.keras. See https://github.com/tensorflow/community/blob/master/rfcs/20200205-standalone-keras-repository.md @gabrieldemarmiesse, thanks for the explanation. I was not aware of the difference between multi-backend keras and tf.keras, and the fact that the former is deprecated. Now your link provides some explanation, but only discusses a reorganisation of Keras in relation to Tensorflow. The end of the multi-backend nature is not discussed. Probably it is an implicit consequence? (A quite severe one...)
You can get a bit more info about it at https://keras.io/
The major reason was this: it is not realistic for the keras maintainers to continue to maintain backends which represent only 2% of the users. Furthermore CNTK and Theano are both deprecated.
Most helpful comment
@tillmo Well, then I should bring the code back to my small tool lib...
@all: It's really a shame that we (the addons, the keras and the tensorflow team) do not manage to implement a proper f1 function. This is so basic that I would refuse to call any tool to be complete without it. Sorry for these self critical words.