Ticker

6/recent/ticker-posts

Decision Tree Classification - A Practice problem || Machine Learning

classification ml techwaker techwakerai

    In this part  of classification we will be familiarizing with  Decision Tree classification algorithm.If you need to know about regression and some classification algorithms check our previous posts.

    As we discussed in our decision tree regression article, its a method of splitting the dataset into smaller and smaller subset thus forming a tree like structure. Some of the key terms in Decision Tree are -
  • Splitting   - The process of dividing the dataset into smaller sub-units
  • Parent and Child Node - The node which get divided into several sub-node is parent    node and the sub-node formed is called child node. If this parent node is the starting    point of the  entire splitting it is called Root Node
  • Subtree /Branch - If a subnode again split into further subnodes that entire part is      called subtree (one Parent - Child part).It is a part of entire tree.
  • Decision Node - If a subnode split into further subnodes Then that splitted subnode is called decision node.
  • Terminal / Leaf Node - The bottom level node which do not further split is called terminal / leaf node.
  • Pruning - It is the opposite of splitting , that is process of removing subnodes.


    In decision tree classification best attribute is select using attribute selecting measure and this attribute is considered as the decision node and splitting takes place. This process will continues until all the conditions are satisfied

Example

    Let's start our python implementation for Decision tree classification. We are using the same dataset, Social network ad dataset (download) that we used in prevous classification problem for getting an idea about different classification algorithms. The dataset contains the details of users in a social networking site to find whether a user buy a product by clicking the ad in the site based on their salary,age and gender.

confusionmatrix techwaker techwakerai knn classificatio machinelearning

Let's start the programming by importing essential libraries required for our problem
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
Importing of dataset and slicing it into independent and dependent variables
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1, 2, 3]].values
y = dataset.iloc[:, -1].values
Since our dataset containing character variables we have to encode it using LabelEncoder
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])
We are performing train test split on  dataset. We are providng the test size as 0.20, that means our training sample contains 320 training set and test sample contains 80 test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
Next, we are doing feature scaling to the training and test set of independent variables.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
It's the time to fit our decision tree algorithm to the dataset
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

We are taking the criterion parameter as entropy,it's a strategy used to split at each node that measures the quality of split
Our Model is created, now we have to predict the output for test set
y_pred = classifier.predict(X_test)

Comparing true and predicted value :

confusionmatrix techwaker techwakerai knn classificatio machinelearning                                       confusionmatrix techwaker techwakerai knn classificatio machinelearning 

                                   y_pred                                                        y_test

We can evaluate our matrix using confusion matrix and accuracy score by comparing the predicted and actual test values
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)
confusion matrix -
confusionmatrix techwaker techwakerai decision tree classificatio machinelearning


ac0.91

Our model posses good accuracy

Full Code - 
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the Decision Tree Classification model on the Training set
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test, y_pred)

    Next we will cover other classification techniques in machine learning including SVM(Support Vector Machines).  Stay Waited and Do practice the algorithms and go through the theory .You can further refer the elbow method ( we will discuss it in K Means clustering article ) also.

 HAPPY READING !!!!

Post a Comment

0 Comments