Random Forest Classification - A Practice Problem

Ipad, Tablet, Technology, Touch, Computer, Screen

We already discussed about Random forest in regression part. In this article we are performing random forest algorithm for a classification problem. Random forest is an ensemble type algorithm which combines multiple decision tree algorithm for predicting the new value. Let;s go through the different steps involved in Random forest classification

Steps for random Forest Regression

Pick at random k data point from the dataset
Build the decision Tree associated with this K data point.
Choose the number N tree of trees you want to build and repeat step1 and step 2
For a new data point, make each one of our Ntree trees predict the category to which the data point belongs, and assign the new data point to the majority valued category.

Let's go through an example

Let's practice the Random Forest Classification by creating a model for Social network ad dataset , the same dataset we are are used in the previous classification algorithms. It contains the details of users in a social networking site to find whether a user buy a product by clicking the ad in the site based on their salary,age and gender.

confusionmatrix techwaker techwakerai knn classificatio machinelearning

Let's start the programming

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn

Importing of dataset and slicing it into X and y

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1, 2, 3]].values
y = dataset.iloc[:, -1].values

Since our dataset containing character variables we have to encode it using LabelEncoder

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])

We are performing train test split. We are providng the test size as 0.20, that means our training sample contains 320 training set and test sample contains 80 test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

Next step is performing feature scaling on the dataset

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

It's the time to fit our random forest algorithm to the dataset

from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

We are taking the criterion parameter as entropy,it's a strategy used to split at each node that measures the quality of split and the n_estimators as 10, which is the number of decision trees we should have to consider for splitting

Next is predicting the output for test set

y_pred = classifier.predict(X_test)

Comparing true and predicted value :

y_pred y_test

We can evaluate our matrix using confusion matrix and accuracy score by comparing the predicted and actual test values

from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)

confusion matrix -

ac - 0.9125

Our model have good accuracy

Full Code -

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the Random Forest Classification model on the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix and finding accuracy score
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)

Now that we have dealt with this kind of classification ,in the upcoming post we will discuss about the Support vector machines ,go through the theory and also a practice problem.Do not forget to work out the problem.

Happy Reading!!!

Ticker

Random Forest Classification - A Practice Problem

Post a Comment

0 Comments

Popular Posts

What is Machine Learning

What Do You Know About WEB DEVELOPMENT?

Support Vector Machine (SVM) - A Practice Problem || Machine Learning Series

Report Abuse

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget

Ticker

Random Forest Classification - A Practice Problem

You may like these posts

Post a Comment

0 Comments

Popular Posts

What is Machine Learning

What Do You Know About WEB DEVELOPMENT?

Support Vector Machine (SVM) - A Practice Problem || Machine Learning Series

Report Abuse

Social Plugin

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget