Basics : Support Vector Machines

2 minute read

Objective: Apply SVM (Support Vector Machines) to the popular Iris dataset and classify the flowers on the basis of their features.

Source: Udemy | Python for Data Science and Machine Learning Bootcamp

#importing libraries to be used
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#view plots in jupyter notebook
%matplotlib inline
sns.set_style('whitegrid') #setting style for plots, optional

#importing data from the seaborn datasets
iris = sns.load_dataset('iris')

#check data info
iris

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa
...	...	...	...	...	...
145	6.7	3.0	5.2	2.3	virginica
146	6.3	2.5	5.0	1.9	virginica
147	6.5	3.0	5.2	2.0	virginica
148	6.2	3.4	5.4	2.3	virginica
149	5.9	3.0	5.1	1.8	virginica

150 rows × 5 columns

#visualize/explore dataset
sns.pairplot(data=iris,hue='species')

pairplot of iris data

The species ‘Setosa’ seems to be the most separable

sns.jointplot(x='sepal_width',y='sepal_length',data=iris[iris['species']=='setosa'],kind='kde',cmap='coolwarm_r')

KDE plot of width and length of Setosa

Split data into Train and test datasets and train model

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(iris.drop('species',axis=1),iris['species'],test_size=0.3, random_state=101)

from sklearn.svm import SVC

model_svc = SVC()

model_svc.fit(X_train,y_train)

SVC()

predictions = model_svc.predict(X_test)

from sklearn.metrics import classification_report, confusion_matrix

print(confusion_matrix(y_test,predictions))

[[13  0  0]
 [ 0 19  1]
 [ 0  0 12]]

print(classification_report(y_test,predictions))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        13
  versicolor       1.00      0.95      0.97        20
   virginica       0.92      1.00      0.96        12

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45

The model is already a good fit with an accuracy of 98%. But usually, we might not get such results with default parameters in SVM and we need to do cross validations to get the best parameters to run our model. We can do this with GridSearch.

Gridsearch Practise

from sklearn.model_selection import GridSearchCV
#Creating a dictionary to sepcify values to do cross validation on
param_grid = {'C': [0.1,1, 10, 100, 1000], 'gamma': [1,0.1,0.01,0.001,0.0001]}

grid = GridSearchCV(SVC(),param_grid,refit=True,verbose=3)
grid.fit(X_train,y_train)

Fitting 5 folds for each of 25 candidates, totalling 125 fits
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.905, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=1.000, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.905, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
[CV] ...................... C=0.1, gamma=1, score=0.905, total=   0.0s
[CV] C=0.1, gamma=1 ..................................................
.
.
.
[CV] C=1, gamma=0.001 ................................................


[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done   2 out of   2 | elapsed:    0.0s remaining:    0.0s


[CV] .................... C=1, gamma=0.001, score=0.714, total=   0.0s
[CV] C=1, gamma=0.001 ................................................
[CV] .................... C=1, gamma=0.001, score=0.714, total=   0.0s
[CV] C=1, gamma=0.001 ................................................
[CV] .................... C=1, gamma=0.001, score=0.714, total=   0.0s
[CV] C=1, gamma=0.0001 ...............................................
[CV] ................... C=1, gamma=0.0001, score=0.333, total=   0.0s
.
.
.
[CV] ................ C=1000, gamma=0.0001, score=1.000, total=   0.0s


[Parallel(n_jobs=1)]: Done 125 out of 125 | elapsed:    0.6s finished


GridSearchCV(estimator=SVC(),
             param_grid={'C': [0.1, 1, 10, 100, 1000],
                         'gamma': [1, 0.1, 0.01, 0.001, 0.0001]},
             verbose=3)

n_predictions = grid.predict(X_test)

print(confusion_matrix(y_test,n_predictions))

[[13  0  0]
 [ 0 19  1]
 [ 0  0 12]]

print(classification_report(y_test,n_predictions))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        13
  versicolor       1.00      0.95      0.97        20
   virginica       0.92      1.00      0.96        12

    accuracy                           0.98        45
   macro avg       0.97      0.98      0.98        45
weighted avg       0.98      0.98      0.98        45

Result: We were able to successfully train our model and predict iris flower species.

The accurancy didn’t improve much after grid search as it was already great to begin with and data set was quite small.

Share on

Twitter Facebook Google+ LinkedIn

Vanya Sahu

Basics : Support Vector Machines

Objective: Apply SVM (Support Vector Machines) to the popular Iris dataset and classify the flowers on the basis of their features.

Split data into Train and test datasets and train model

Gridsearch Practise

Result: We were able to successfully train our model and predict iris flower species.

The accurancy didn’t improve much after grid search as it was already great to begin with and data set was quite small.

Share on

You May Also Enjoy

Determine neighbourhood to open new restaurant using clustering

Predict Movie Collections

Predict Loan Repayment

Predict House price