Decision Tree Classification - A Practice problem || Machine Learning

In this part of classification we will be familiarizing with Decision Tree classification algorithm.If you need to know about regression and some classification algorithms check our previous posts.

As we discussed in our decision tree regression article, its a method of splitting the dataset into smaller and smaller subset thus forming a tree like structure. Some of the key terms in Decision Tree are -

Splitting - The process of dividing the dataset into smaller sub-units

Parent and Child Node - The node which get divided into several sub-node is parent node and the sub-node formed is called child node. If this parent node is the starting point of the entire splitting it is called Root Node

Subtree /Branch - If a subnode again split into further subnodes that entire part is called subtree (one Parent - Child part).It is a part of entire tree.

Decision Node - If a subnode split into further subnodes Then that splitted subnode is called decision node.

Terminal / Leaf Node - The bottom level node which do not further split is called terminal / leaf node.

Pruning - It is the opposite of splitting , that is process of removing subnodes.

In decision tree classification best attribute is select using attribute selecting measure and this attribute is considered as the decision node and splitting takes place. This process will continues until all the conditions are satisfied

Example

Let's start our python implementation for Decision tree classification. We are using the same dataset, Social network ad dataset (download) that we used in prevous classification problem for getting an idea about different classification algorithms. The dataset contains the details of users in a social networking site to find whether a user buy a product by clicking the ad in the site based on their salary,age and gender.

confusionmatrix techwaker techwakerai knn classificatio machinelearning

Let's start the programming by importing essential libraries required for our problem

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn

Importing of dataset and slicing it into independent and dependent variables

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1, 2, 3]].values
y = dataset.iloc[:, -1].values

Since our dataset containing character variables we have to encode it using LabelEncoder

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])

We are performing train test split on dataset. We are providng the test size as 0.20, that means our training sample contains 320 training set and test sample contains 80 test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

Next, we are doing feature scaling to the training and test set of independent variables.

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

It's the time to fit our decision tree algorithm to the dataset

from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

We are taking the criterion parameter as entropy,it's a strategy used to split at each node that measures the quality of split

Our Model is created, now we have to predict the output for test set

y_pred = classifier.predict(X_test)

Comparing true and predicted value :

y_pred y_test

We can evaluate our matrix using confusion matrix and accuracy score by comparing the predicted and actual test values

from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)

confusion matrix -

ac - 0.91

Our model posses good accuracy

Full Code -

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the Decision Tree Classification model on the Training set
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test, y_pred)

Next we will cover other classification techniques in machine learning including SVM(Support Vector Machines). Stay Waited and Do practice the algorithms and go through the theory .You can further refer the elbow method ( we will discuss it in K Means clustering article ) also.

HAPPY READING !!!!

Ticker

Decision Tree Classification - A Practice problem || Machine Learning

Post a Comment

0 Comments

Popular Posts

Beginner's note -Reinforcement Learning

Introduction to Regression techniques in machine learning - Part 1

Decision Tree Regression - A Practice Problem

Report Abuse

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget

Ticker

Decision Tree Classification - A Practice problem || Machine Learning

You may like these posts

Post a Comment

0 Comments

Popular Posts

Beginner's note -Reinforcement Learning

Introduction to Regression techniques in machine learning - Part 1

Decision Tree Regression - A Practice Problem

Report Abuse

Social Plugin

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget