Ticker

6/recent/ticker-posts

Random Forest Classification - A Practice Problem

Ipad, Tablet, Technology, Touch, Computer, Screen

We already discussed about Random forest in regression part. In this article we are performing  random forest algorithm for a classification problem. Random forest is an ensemble type algorithm which combines multiple decision tree algorithm for predicting the new value. Let;s go through the different steps involved in Random forest classification

 Steps for random Forest Regression

  1. Pick at random k data point from the dataset
  2. Build the decision Tree associated with this K data point.
  3. Choose the number N tree of trees you want to build and repeat step1 and step 2
  4. For a new data point, make each one of our Ntree trees predict the category to which the data point belongs, and assign the new data point to the majority valued category.


Let's go through an example

Let's practice the Random Forest Classification by creating a model for Social network ad dataset , the same dataset we are are used in the previous classification algorithms. It contains the details of users in a social networking site to find whether a user buy a product by clicking the ad in the site based on their salary,age and gender.



confusionmatrix techwaker techwakerai knn classificatio machinelearning

Let's start the programming
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
Importing of dataset and slicing it into X and y
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1, 2, 3]].values
y = dataset.iloc[:, -1].values
Since our dataset containing character variables we have to encode it using LabelEncoder
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])
We are performing train test split. We are providng the test size as 0.20, that means our training sample contains 320 training set and test sample contains 80 test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
Next step is performing feature scaling on the dataset 
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
It's the time to fit our random forest algorithm to the dataset
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

We are taking the criterion parameter as entropy,it's a strategy used to split at each node that measures the quality of split and the n_estimators as 10, which is the number of decision trees we should have to consider for splitting
Next is  predicting the output for test set
y_pred = classifier.predict(X_test)

Comparing true and predicted value :

confusionmatrix techwaker techwakerai knn classificatio machinelearning                                       confusionmatrix techwaker techwakerai knn classificatio machinelearning 

                                   y_pred                                                        y_test

We can evaluate our matrix using confusion matrix and accuracy score by comparing the predicted and actual test values
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)
confusion matrix -


ac - 0.9125

Our model have good accuracy



Full Code - 
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the Random Forest Classification model on the Training set
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix and finding accuracy score
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)
  

Now that we have dealt with this kind of classification ,in the upcoming post we will discuss about the Support vector machines ,go through the theory and also a practice problem.Do not forget to work out the problem.

Happy Reading!!!




Post a Comment

0 Comments