Support Vector Machines in Machine Learning

Spread the love

Support Vector Machines –

The fundamental idea behind support vector Machines is to fit the widest possible street between the classes. In other words, the goal is to have the largest possible margin between the decision boundary that separates the two classes and the training instances.

What is a support vectors ?

After training an SVM, a support vector is any instance located on the street, including it’s border. The decision boundary is entirely determined by the support vectors. Any instances that is not a support vector (i.e. off the street) has no influence whatsoever; you could remove them, add more instances, or move them around and as long as they stay off the street they won’t affect the decision boundary. computing the predictions only involves the support vectors not the whole training set.

What is Hard and Soft Margin Classification in SVM?

if we strictly impose that all instances must be off the street and on the right side, this is called hard margin classification. There are two main issues with hard margin classification. First it only works if the data is linearly separable. Second it is sensitive to outliers.

To avoid these issues, we use a more flexible model. The objective is to find a good balance between keeping the street as large as possible and limiting the margin violations (i.e. instances that end up in the middle of the street or even on the wrong side). This is called soft margin classification.

How to Train a Linear SVM Model in Sklearn?

Let’s read a dataset to work with.

import pandas as pd
import numpy as np

url = 'https://raw.githubusercontent.com/bprasad26/lwd/master/data/breast_cancer.csv'
df = pd.read_csv(url)
df.head()

Next split the data into a training and test set.

from sklearn.model_selection import train_test_split

X = df.drop('diagnosis', axis=1)
y = df['diagnosis']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Next we will train a Linear SVM model in Sklearn. SVMs are sensitive to the feature scales so we will also scale the features using StandardScaler in sklearn.

from sklearn.svm import LinearSVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score

# create a svm model with feature scaling
svm_clf = make_pipeline(StandardScaler(), LinearSVC(loss='hinge'))

# train it on the training set
svm_clf.fit(X_train, y_train)

# make predictions on the test set
y_pred = svm_clf.predict(X_test)

# measure accuracy
accuracy_score(y_test, y_pred)
# output
0.956140350877193

Instead of using the LinearSVC class, we could also use the SVC class with a linear kernel when creating the SVC model. For that we have to write SVC(kernel=’linear’).

Related Posts –

  1. What is the Kernel Trick in Support Vector Machines?
  2. How to Use Support Vector Machines for Regression?

Rating: 1 out of 5.

Leave a Reply