## What is Logistic Regression and How does it Works?

Logistic Regression (also called Logit Regression) is commonly used to estimate the probability that an instance belongs to a particular class (e.g. what is the probability that this email is spam?). If the estimated probability is greater than 50% then the model predicts that the instance belongs to that class (called the positive class, labeled “1”) and otherwise it predicts that it does not (i.e. it belongs to the negative class labeled “0”). This makes it a binary classifier.

So how does Logistic Regression work?

Just like a Linear Regression model, a Logistic Regression model computes a weighted sum of the input features (plus a bias term), but instead of outputting the result directly like the Linear Regression model does, it outputs the logistics of this result (see the equation below).

The logistic is a sigmoid function(i.e. S=shaped) that outputs a number between 0 and 1. It is defined as shown below.

Once the Logistic Regression Model has estimated the probability phat that an instance x belongs to the positive class, it can make it’s prediction yhat easily.

## How to Train a Logistic Regression Model in Sklearn ?

Let’s read a dataset to work with.

```
import pandas as pd
import numpy as np
url = 'https://raw.githubusercontent.com/bprasad26/lwd/master/data/breast_cancer.csv'
df = pd.read_csv(url)
df.head()
```

Next split the data into training and test set.

```
from sklearn.model_selection import train_test_split
X = df.drop('diagnosis', axis=1)
y = df['diagnosis']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
```

Now train a logistic regression model and measure the accuracy.

```
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
# create a Logistic Regression Object
log_reg = LogisticRegression()
# train it on training set
log_reg.fit(X_train , y_train)
# make predictions on test set
y_pred = log_reg.predict(X_test)
# measure accuracy
accuracy_score(y_test, y_pred)
```

```
# output
0.9649122807017544
```