If you look at the above picture, you can see that Artificial Neural Network ( ANN ) consists of many neurons that are structured in layers to perform some kind of calculations and predict an output. This architecture is also called a **multilayer perceptron**. In the MLP diagram, each node is called a **neuron**. Let’s first understand how a single perceptron works then we will discuses how multiple perceptrons works together to learn data features.

## What is a perceptron ?

The most simple neural network is the perceptron, which consists of a single neuron. Conceptually, the perceptron functions in a manner similar to a biological neuron. A biological neuron receives electrical signals from its dendrites, modulates the electrical signals in various amounts and then fires an output signal through its synapses only when the total strength of the input signals exceeds a certain threshold. The output is then fed to another neuron and so forth.

To model the behavior of a biological neurons, the artificial neurons performs two consecutive functions. First it calculates the **weighted sum** of the inputs to represents the total strength of the input signal and then it applies a **step function** to the result to determine whether to fire the output 1 if the signal exceeds a certain threshold or 0 if the signal doesn’t exceeds the threshold.

#### Connection Weights –

Not all input features are equally useful or important. To represent that, each input node is assigned a weight value, called its **connection weight** to reflect its importance in the decision making process. Inputs assigned greater weight have a greater effect on the output. If the weight is high, it amplifies the input signal and if the weight is low, it diminishes the input signal. In common representation of neural networks, the weights are represented by lines or edges from the input node to the perceptron.

In the perceptron diagram you can see the following –

**Input vector –** The feature vector that is fed to the neuron. It is usually denoted with an uppercase X to represent a vector of inputs (x1, x2….xn).

**Weights vector – **Each x1 is assigned a weight value w1 that represents its importance to distinguish between different input datapoints.

**Neuron functions – **The calculations performed within the neuron to modulate the input signals. The weighted sum and step function.

**Output – **Controlled by the type of activation function you choose for your network. There are different activation functions which we will discuss later. For a step function, the output is either 0 or 1. Other activation functions produce probability output or float numbers. The output node represents the perceptron prediction.

#### Weighted SUM Function –

Also known as a linear combination, the weighted sum function is the sum of all inputs multiplied by their weights and then added to a bias term. This function produces a straight line represented in the following equation.

Here is how we implement the weighted sum in python

`z = np.dot(w.T, X) + b`

X is the input vector (uppercase x), w is the weight vector and b is the y-intercept.

#### What is a bias in the perceptron and why do we add it ?

The function of a straight line is represented by the equation **( y = mx + b )**, where b is the y-intercept. To able to define a line, you need two things. The slope of the line and a point on the line. The bias is that point on the y-axis. Bias allows you to move the line up and down on the y-axis to better fit the prediction with the data. Without th**e bias (b)**, the line always has to go through the origin point (0,0) and you will get a poorer fit.

The input layer can be given biases by introducing an extra input node that always has a value of 1. In neural network, the value of the bias (b) is treated as an extra weight and is learned and adjusted by the neuron to minimize the cost function.

#### STEP Activation Function –

In both artificial and biological neural networks, a neuron does not just output the bare input it receives. Instead there is one more step called an **activation function**. This is the decision making unit of the brain. In ANNs, the activation function takes the same weighted sum input from before and activates (fires) the neuron if the weighted sum is higher than a certain threshold. This activation happens based on the activation function calculations. The simplest activation function used by the perceptron algorithm is the step function that produces a binary output (0 0r 1). It basically says that if the summed input >= 0 , it fires (output =1) else (summed input < 0), it doesn’t fires (output = 0).

This is how the step function looks in python.

```
def step_function(z):
if z <= 0:
return 0
else:
return 1
```

### How does the perceptron learn ?

The perceptron’s learning logic goes like this.

- The neuron calculates the weighted sum and applies the activation function to make a prediction y_hat. This is called the feedforward process.
- It compares the output prediction with the correct label to calculate the error.
- It then updates the weight. if the prediction is too high, it adjusts the weight to make a lower prediction the next time, and vice versa.
- Repeat.

This process is repeated many times and the neuron continues to update the weights to improve its predictions until step 2 produces a very small error (close to zero), which means the neuron’s prediction is very close to the correct value. At this point, we can stop the training and save the weight values that yielded the best results to apply to future cases where the outcome is unknown.

### Multilayer Perceptron –

A single perceptron works great with simple datasets that can be separated by a line. But as you can imagine, the real world is much more complex than that. This is where neural networks can show their full potential.

**Linear Vs Non-Linear Dataset – **

**Linear dataset –** The data can be split with a single straight line.

**Nonlinear dataset – **The data cannot be split with a single straight line. We need more than one line to form a shape that splits the data.

To split a nonlinear dataset, we need more than one line. This means we need to come up with an architecture to use tens and hundreds of neurons in our neural network.

#### Multilayer Perceptron Architecture –

We’ve seen how a neural network can be designed to have more than one neuron. Let’s expand on this idea with a more complex dataset. The diagram above is from tensorflow playground website (https://playground.tensorflow.org ). We try to model a spiral dataset to distinguish between two classes. In order to fit this dataset, we need to build a neural network that contains tens of neurons. A very common neural network architecture is to stack the neurons in layers on top of each other called **hidden layers**. Each layer has n number of neurons. Layers are connected to each other by weight connection. This leads to the** multilayer perceptron (MLP)** architecture in the figure.

The main components of the neural network architecture are as follows.

**Input layer –**Contains the feature vector.**Hidden layer –**The neurons are stacked on top of each other in hidden layers. They are called hidden layers because we don’t see or control the input going into these layers or the output. All we do is feed the feature vector to the input layer and see the output coming out of the output layer.**Weighted connections (edges) –**Weights are assigned to each connection between the nodes to reflect the importance of their influence on the final output prediction. In graph network terms, these are called edges connecting the nodes.**Output layer –**We get the answer or prediction from our model from the output layer. Depending on the setup of the neural network, the final output may be a real-valued output (regression problem) or a set of probabilities (classification problem). This is determined by the type of activation function we use in the neuron in the output layer.

#### What are hidden layers ?

This is where the core of the feature-learning process takes place. When you look at the hidden layers nodes in the above figure, you can see that the early layers detect simple patterns to learn low-level features (straight lines). Later layers detect patterns within patterns to learn more complex features and shapes, then patterns within patterns within patterns and so on. In neural network we stack hidden layers to learn complex features from each other until we fit our data. So when you are designing your neural network, if your network is not fitting the data, the solution could be adding more hidden layers.

#### Fully connected layers –

It is important to call out that the layers in classical MLP network architecture are fully connected to the next hidden layer. Each node in a layer is connected to all the nodes in the previous layer. This is called a fully connected network. These edges are the weights that represents the importance of this node to the output value.