Ticker

6/recent/ticker-posts

Boston house price prediction with Decision Tree Regression || Machine Learning

bostonhousepricepredictionwithdecisiontree

     In our previous machine learning article we discussed about Decision tree regression. So let's create a model for Boston house price prediction using

decision tree regression. 

    We are using Boston house prediction dataset for this and we will be using scikit-learn's boston dataset.

About Dataset

    It's a dataset of 506 rows and 13 columns (features) along with a column for price of the house. 

Let's start our model implementation starting with importing of essential libraries.

import pandas as pd
import numpy 
import matplotlib.pyplot as plt
import sklearn

Next we can load our dataset using sklearn library

from sklearn.datasets import load_boston
dataset = load_boston()


13 features are used in this dataset, let's print it.

print(dataset.feature_names)

Output:

['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO' 'B' 

'LSTAT']

These outputs are short form of the features ,have a look at the full form

CRIM     - Per capita crime rate by town

ZN          - Proportion of residential land zoned for lots over 25,000 sq.fit

INDUS   - Proportion of non - retail business acres per town

CHAS     - Charles river dummy variable (1 if tract bounds river ,0 otherwise)

NOX       - Nitric oxide concentration (parts per 10 million)

RM          - Average numner of rooms per dwelling 

AGE        - Proportion of owner - occupied units built prior to 1940

DIS          - weighted distances to five Boston employment centers

RAD        - Index of accesibility to radial highways

PTRATIO - Pupil-teacher ratio by town

             - 1000(Bk-0.63)^2, where Bk is the proportion  of [people of African American                        descent ] by town

LSTAT      - Percentage of lower status of the population

MEDV      - Median value of owner - occupied homes in $1000s


Creating the independent (X) and dependent (y) variables.

X = pd.DataFrame(dataset.data)
y = dataset.target

Checking if there is any missing values

print(X.isnull().sum())

Output:

Output shows there is no missing values in our dataset

Now let's split our dataset into training and testing set

from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=0)

In this problem we are using the decision tree algorithm, so let's create the decision tree regression model

from sklearn.tree import DecisionTreeRegressor
dtr = DecisionTreeRegressor()
dtr.fit(X_train,y_train)

We created our model. Next is to predict the price for test values for analysing the accuracy of our model.

pred_price = dtr.predict(X_test)

Comparison of predicted output and original output (upto 8 values)

                      

                                       Prediction                                  Original

Full code for the problem:

#importing libraries
import pandas as pd
import numpy 
import matplotlib.pyplot as plt
import sklearn

#importing dataset
from sklearn.datasets import load_boston
dataset = load_boston()

#features
print(dataset.feature_names)

#as independent and dependent variables
X = pd.DataFrame(dataset.data)
y = dataset.target

#finding missing values
print(X.isnull().sum())

#train test splitting
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=0)

#model creation
from sklearn.tree import DecisionTreeRegressor
dtr = DecisionTreeRegressor()
dtr.fit(X_train,y_train)

#prediction
pred_price = dtr.predict(X_test)    

Now taking look at one more practice problem in decision tree regression you might have got a better understanding than before.Check this algorithm and try to write code on your own . Dataset can be imported using the algorithm and no need of downloading them.Try working out...                            

KEEP READING !!!

Post a Comment

0 Comments