## Bias/Variance Trade-off –

The bias/variance trade-off is something we should aware of when building supervised learning models. We want a model that learns from the training data and also makes accurate predictions when given new data. Basically, we want a model that is both reliable and will generalize well. A model that does both is one with low bias and low variance.

The trade-off is that as complexity increases, bias decreases but variance increases. Since the sum of bias and variance is the total error, this is minimized as some point in the middle. A model with high variance is too complicated and tuned to the training data specifically, that’ why it makes accurate predictions on the seen data but very poor predictions on unseen data. A model with high bias is too simple and does not learn the underlying pattern in the data. Striking this balance is the key to success.

We have a few ways to tell if one is higher than the other and ways to better balance it out. If you notice that during the training phase your model is getting really high scores but when evaluated on unseen data the scores are low we could say that our model is overfitting. This tells us that our model has high variance and low bias and we can bring down the variance by reducing complexity of the model. For a random forest or decision tree that would mean pruning trees, reducing the depth, reducing the number of estimators, or lowering the number of features used in each split. For a regression problem, it might mean fewer features.

And if you see that your model did poorly on the training and testing phase, we would know that this model has high bias and low variance or my model is underfitting. If this occurs, We would want to lower the bias by increasing the complexity of the model. For a linear regression problem that may mean adding more features or using an ensemble of models.