Keras: Feature Request : Add GELU activation function

Created on 10 Dec 2018  路  8Comments  路  Source: keras-team/keras

I just realized that keras does not have a GELU activation function in activations.py. I request that it be added, because it has many applications in neural networks.

Note : I'll probably submit a pull request for it.

  • [y] Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/keras-team/keras.git --upgrade --no-deps

  • [y] Check that your version of TensorFlow is up-to-date. The installation instructions can be found here.

feature

Most helpful comment

GELU activation has started to pick up and it has been published a while ago (2016):
https://arxiv.org/abs/1606.08415

Also been used in OpenAI's GPT-1 and 2 and Google's BERT papers. Would love to see this implemented in Keras activations.

All 8 comments

I don't think this should be merged into Keras.

  • Not widely used
  • Not published yet

Please submit your PR at keras-contrib.

This guy uses it, and he clearly knows whats going on..
https://github.com/borisbanushev/stockpredictionai
keras code:

from keras.layers import Activation
from keras.utils.generic_utils import get_custom_objects

def custom_gelu(x):
    return 0.5 * x * (1 + tf.tanh(tf.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))
get_custom_objects().update({'custom_gelu': Activation(custom_gelu)})
fit1.add(Dense(output_dim=1, activation=custom_gelu))

somethings wrong with that custom loss.. getting really strange predictions not going under ~-.25

GELU activation has started to pick up and it has been published a while ago (2016):
https://arxiv.org/abs/1606.08415

Also been used in OpenAI's GPT-1 and 2 and Google's BERT papers. Would love to see this implemented in Keras activations.

Code from Google's BERT:

def gelu(x):
    """Gaussian Error Linear Unit.
    This is a smoother version of the RELU.
    Original paper: https://arxiv.org/abs/1606.08415
    Args:
        x: float Tensor to perform activation.
    Returns:
        `x` with the GELU activation applied.
    """
    cdf = 0.5 * (1.0 + tf.tanh(
        (np.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3)))))
    return x * cdf

Code from OpenAI's GPT-2:

def gelu(x):
    return 0.5*x*(1+tf.tanh(np.sqrt(2/np.pi)*(x+0.044715*tf.pow(x, 3))))

This guy uses it, and he clearly knows whats going on..
https://github.com/borisbanushev/stockpredictionai
keras code:

from keras.layers import Activation
from keras.utils.generic_utils import get_custom_objects

def custom_gelu(x):
    return 0.5 * x * (1 + tf.tanh(tf.sqrt(2 / np.pi) * (x + 0.044715 * tf.pow(x, 3))))
get_custom_objects().update({'custom_gelu': Activation(custom_gelu)})
fit1.add(Dense(output_dim=1, activation=custom_gelu))

somethings wrong with that custom loss.. getting really strange predictions not going under ~-.25

it's not wrong that you are not getting below -0.25, look at the graph for the function:
image

I know that It start to be very confusing but I need to make a cross Org reference https://github.com/tensorflow/tensorflow/pull/33945

Gelu is in tensorflow https://github.com/tensorflow/tensorflow/pull/41178. You can close this.

Thank you, @bhack! I will close this issue :)

Was this page helpful?
0 / 5 - 0 ratings