Ticker

6/recent/ticker-posts

Random Forest Regression - A Practice Problem

closeup photo of eyeglasses

  In this article we are learning about ensemble type algorithm, Random Forest regression. In ensemble type we are combining multiple models to create a new model with improved accuracy. This same concept is used in random forest regression, in this multiple decision tree regression models are combined to create an accurate model. Since it consists of several sub trees it will reduce overfitting ( means the model closely fit to the given set of data and failed on unseen data ) caused by the model. 

Steps for random Forest Regression
  1. Pick at random k data point from the training set
  2. Build the decision Tree associated with this K data point.
  3. Choose the number N tree of trees you want to build and repeat 1 and 2
  4. For a new data point, make each one of our Ntree trees predict the value of Y to for the data point in question and assign the new data point the average across all of the predicted Y values.

Let's go through an example

 EXAMPLE Let's go through an example problem to give you overview of the splitting and predict the values, here we are using the same dataset (download dataset) that we are used in the polynomial regression and decision tree inorder to get an over view about different regression algorithms. As we discussed in the earlier article it is a dataset  of salaries of different positions in a company and it is beginner friendly. 

So let's start our model creation.

Have a look at the dataset.


Now let's go to our code part. First of all import all the required libraries.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
Next we have to import the dataset and slice it into independent(X) and dependent variables(y)
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values
Since it contains only a limited amount of data we are not doing train test split. So for creating the model we import RandomForestRegressor class from scikit learn library.

from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(X, y)

Here we are taking 2 parameters, n_estimators and random_state, we have already discussed about random_state in our previous articles and n_estimators refers the number of trees that we are using for the model creation .There are so many other parameters like max_features, max_depth, min_samples_split etc.., since we are starting from beginning we are only using limited number of parameters.
Now let's analyse our prediction graphically using matplotlib 

X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Random Forest Regression')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

Graphical representation:


Let's predict the corresponding output for 7.5 using the created model 
regressor.predict([[7.5]])
  
Output : 240000

Full Code:
#Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values

# Training the Random Forest Regression model on the whole dataset
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 10, random_state = 0)
regressor.fit(X, y)

# Visualising the Random Forest Regression results (higher resolution)
X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.style.use('dark_background')
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Random Forest Regression')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

# Predicting a new result
regressor.predict([[7.5]])

So this ensemble model that brought the decision tree together to form a random forest proved a stronger and better model.This was not far much to understand once you get concept of decision trees.
Later we will begin with the classification techniques and learn more on the algorithms.It might be more easy to understand problems but if complexity exists, comes in the algorithms.We will be trying best to convey the concepts. Do not forget to workout this model. 

KEEP READING!!


Post a Comment

0 Comments