Activation Functions Explained – GELU, SELU, ELU, ReLU and more

 Activation Functions Explained – GELU, SELU, ELU, ReLU and more
Activation Functions Explained - GELU, SELU, ELU, ReLU and more

During the calculations of the values for activations in each layer, we use an activation function right before deciding what exactly the activation value should be. From the previous activations, weights and biases in each layer, we calculate a value for every activation in the next layer. But before sending that value to the activations of the next layer, we use an activation function to scale the output. Here, we will explore different activation functions.

The prerequisite for this post is my last post about feedfordward and backpropagation in neural networks, you would have seen that I briefly talked about activation functions, but never actually expanded on what they do for us. Much of what I talk about here will only be relevant if you have the prior knowledge, or have read my previous post.

Neural Networks: Feedforward and Backpropagation Explained
What is neural networks? Developers should understand backpropagation, to figure out why their code sometimes does not work. Visual and down to earth explanation of the math of backpropagation.
Activation Functions Explained - GELU, SELU, ELU, ReLU and more

Code > Theory?Jump straight to the code.

Table of Contents (Click To Scroll)

  1. Small Overview

  2. What is the sigmoid function?

  3. Gradient Problems: Backpropagation

  4. Rectified Linear Unit (ReLU)

  5. Exponential Linear Unit (ELU)

  6. Leaky Rectified Linear Unit (Leaky ReLU)

  7. Scaled Exponential Linear Unit (SELU)

  8. Gaussian Error Linear Unit (GELU)

  9. Code: Hyperparameter Search for Deep Neural Networks

  10. Further Reading: Books and Papers

Small Overview

Activation functions can be a make-or-break-it part of a neural network. In this extensive article (>6k words), I'm going to go over 6 different activation functions, each with pros and cons. I will give you the equation, differentiated equation and plots for both of them. The goal is to explain the equation and graphs in simple input-output terms.

I show you the vanishing and exploding gradient problem; for the latter, I follow Nielsens great example of why gradients might explode.

At last, I provide some code that you can run for yourself, in a Jupyter Notebook.

Activation Functions Explained - GELU, SELU, ELU, ReLU and more

From the small code experiment on the MNIST dataset, we obtain a loss and accuracy graph for each activation function

Sigmoid function

The sigmoid function is a logistic function, which means that, whatever you input, you get an output ranging between 0 and 1. That is, every neuron, node or activation that you input, will be scaled to a value between 0 and 1.

<
[...]

Source - Continue Reading: https://mlfromscratch.com/activation-functions-explained/

webmaster

Related post