# How ReLU really does introduce non-linearity to a neural network

- Artificial Intelligence (AI) machine learning mlfreaks Neural Networks
- stamatis_jl46n2e4
- January 9, 2021
- 0
- 374

Quora user Nathan, make a really nice and useful comment on Quora, (https://www.quora.com/Why-is-ReLU-non-linear/answer/Nathan-Yan-2?comment_id=39524791&comment_type=2), about how ReLU really does introduce non-linearity to a neural network. Respect.

He made a short little program that simulates a random ReLU neural network, with one hidden layer. The network takes a scalar input, and produces a scalar output. To generate this graph, he runs the numbers from -100 to 100, incrementing it by 0.01 every time. Then he plots the output of the network when evaluated on that number on the blue graph. Looking at the blue graph, you can clearly see non-linearity—the graph isn’t a straight line.

On the other hand, look at a linear neural network:

The blue graph is now straight

Now, let’s get into the nitty gritty details, and show you why ReLU introduces non-linearity. If we have a *linear *neural network, with weight matrices A and B, with layer inputs i^{1} and i^{2} , and layer outputs o^{1}, o^{2}

To produce the output of layer one, we take the dot product of the layer input, i^{1} with the weight matrix A:

o^{1}=A⋅i^{1}

clearly, o1o1 is just a set of linear combinations of i^{1}. Since o^{1} is a linear combination of i^{1}, it follows that o^{2} is a linear combination of o^{1} (or i^{2}, they’re the same thing), and that o^{2} is thus a linear combination of i^{1}.

Using differentiation, it’s pretty simple to see that

is constant, and therefore the slope will always stay the same, and explains the linear nature of the network.

With a ReLU network, on the other hand. We have to factor in the fact that the activation function—the ReLU, sometimes returns 0. This can potentially change the gradient.

In the image shown before. The green graph represents when the slope of the function changes. And the orange graph represents how many negative activations were in the network. Notice how every time the negative activations go either up or down, the green graph jumps. This is because every time we get a new negative activation, we change the network dynamics, and the slope changes. This is how we get non-linearity. However, an interesting point is that, as you pointed out, since both components of the ReLU are linear, the blue graph is more like a composite of linear sections, so it produces “non-linearity”, but it’s individual sections are linear.

Reference:

https://www.quora.com/Why-is-ReLU-non-linear/answer/Nathan-Yan-2?comment_id=39524791&comment_type=2

(if the owner of this comment would like to delete this article, just contact with cyberlatentspace by email)