What is Gradient Descent ?
Let’s take an example of linear regression. The goal of linear regression is to find the line that best fits the data. And an error function like mean squared error or mean absolute error tells us on average how far the line is from the data. So if we reduce this error as much as we possibly could then we will find the best fitted line. This process in Mathematics is called minimizing a function. That is finding the smallest possible value that a function can return. Gradient descent is basically used for minimizing an error function.
The general definition of a gradient (also known as a derivative) is that it is the function that tells you the slope or rate of change of the line that is tangent to the curve at any given point. It is just a fancy term for the slope or steepness of the curve.
Gradient Descent simply means updating the weights iteratively to descend the slope of the error curve until we get to the point with minimum error.
How Gradient Descent works ?
We can think of Gradient Descent as the equivalent of descending from a mountain. Let’s say you find yourself in the top of the Mt. Everest and you wish to climb down but it is very foggy and you can only see about 1 meter away from you. So, what do you do? A good strategy is to look around and take a single step in a direction that will help you descend the most. Now if you keep repeating this process again and again many times, hopefully you will reach the bottom of the mountain. The reason I said hopefully is that instead of reaching the Global minimum at the bottom of the mountain, you could also end up in the local minimum somewhere on the valley of the mountain.
This is what Gradient Descent is all about.
- You start somewhere in the mountain
- Find the best direction to take a small step
- Take this small step
- Repeat step 2 – 3 many times until you reach the bottom.
In general, Gradient Descent helps us decide in which direction to take a step that helps you descend the most.