Ticker

6/recent/ticker-posts

Support Vector Machine (SVM) - A Practice Problem || Machine Learning Series

Work, Typing, Computer, Notebook

 Support Vector Machine is one of the popular classification algorithm used in machine learning

. We have already discussed different types classification algorithms, if you want a quick revision check the blog and refresh your knowledge.

Let's dive deep into the Support vector machine classification

    It's a supervised machine learning classification algorithm which classifies the data points into distinct classes. Its is done by considering a hyper plane for separating the data points into different classes.

    What is a hyperplane??  A hyperplane is an n-1 dimensional Euclidean space which classifies the n dimension  Euclidean space into different classes,That means if its a two dimensional space the hyper plane will be a line and in 3D space hyperplane is a plane.Usually hyperplane  are placed in such a way that to maximize the margin, that is the distance between support vectors(these are the points which are closer to the hyperplane which influence the position and alignment of hyperplane ) and hyperplane is as far as possible.

    If our data points are not linearly separable we cannot separate it using a simple line like above. So we should have to do  some other techniques for this. For this condition we project our data points into higher dimension. In higher dimension the datas are in different shape and hence linearly separable. After separating into classes, we can project it back into normal dimension. In the following example, we are using a non linear separable data. We project it to the 3D and separate it using a hyperplane and then project back to the 2D.


Let's implement Support vector classifier using python

 We are using the Social network ad dataset (download) for this problem, which is used in other classification problems. The dataset contains the details of users in a social networking site to find whether a user buy a product by clicking the ad in the site based on their salary,age and gender.

confusionmatrix techwaker techwakerai knn classificatio machinelearning


Let's start the programming by importing essential libraries required
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
Importing of dataset 
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1, 2, 3]].values
y = dataset.iloc[:, -1].values
Since our dataset containing character variables we have to encode it using LabelEncoder
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])
We are performing train test split on our dataset. We are providng the test size as 0.20, that means our training sample contains 320 training set and test sample contains 80 test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
Next, we are doing feature scaling to the training and test set of independent variables
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Next is the important step, SVC  model creation, for this we are importing SVC(support vector classifier) from svm
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)
Let's predict the test results
y_pred = classifier.predict(X_test)
Predicted and actual value - 

                                   

                                    y_pred                                                    y_test


For the first 8 values both are same.We can evaluate our matrix using confusion matrix and accuracy score by comparing the predicted and actual test values
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)
confusion matrix - 

                

ac  -  0.9

Accuracy is good. Note that, you can achieve better results for this problem using different algorithms.


Full Code -
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1,2, 3]].values
y = dataset.iloc[:, -1].values

#encoding the data
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the SVM model on the Training set
from sklearn.svm import SVC
classifier = SVC(kernel = 'linear', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)

That was all for the SVM algorithms. Next we get to know more on the Naive Bayes classification algorithms and how does it improve our model.

Keep Reading!!....

Post a Comment

0 Comments