In our previous machine learning article we discussed about Decision tree regression. So let's create a model for Boston house price prediction using
decision tree regression.We are using Boston house prediction dataset for this and we will be using scikit-learn's boston dataset.
About Dataset
It's a dataset of 506 rows and 13 columns (features) along with a column for price of the house.
Let's start our model implementation starting with importing of essential libraries.
import pandas as pd
import numpy
import matplotlib.pyplot as plt
import sklearn
Next we can load our dataset using sklearn library
from sklearn.datasets import load_boston
dataset = load_boston()
13 features are used in this dataset, let's print it.
print(dataset.feature_names)
Output:
['CRIM' 'ZN' 'INDUS' 'CHAS' 'NOX' 'RM' 'AGE' 'DIS' 'RAD' 'TAX' 'PTRATIO' 'B'
'LSTAT']
These outputs are short form of the features ,have a look at the full form
CRIM - Per capita crime rate by town
ZN - Proportion of residential land zoned for lots over 25,000 sq.fit
INDUS - Proportion of non - retail business acres per town
CHAS - Charles river dummy variable (1 if tract bounds river ,0 otherwise)
NOX - Nitric oxide concentration (parts per 10 million)
RM - Average numner of rooms per dwelling
AGE - Proportion of owner - occupied units built prior to 1940
DIS - weighted distances to five Boston employment centers
RAD - Index of accesibility to radial highways
PTRATIO - Pupil-teacher ratio by town
B - 1000(Bk-0.63)^2, where Bk is the proportion of [people of African American descent ] by town
LSTAT - Percentage of lower status of the population
MEDV - Median value of owner - occupied homes in $1000s
Creating the independent (X) and dependent (y) variables.
X = pd.DataFrame(dataset.data)
y = dataset.target
Checking if there is any missing values
print(X.isnull().sum())
Output:
Output shows there is no missing values in our dataset
Now let's split our dataset into training and testing set
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=0)
In this problem we are using the decision tree algorithm, so let's create the decision tree regression model
from sklearn.tree import DecisionTreeRegressor
dtr = DecisionTreeRegressor()
dtr.fit(X_train,y_train)
We created our model. Next is to predict the price for test values for analysing the accuracy of our model.
pred_price = dtr.predict(X_test)
Comparison of predicted output and original output (upto 8 values)
Prediction Original
Full code for the problem:
#importing libraries
import pandas as pd
import numpy
import matplotlib.pyplot as plt
import sklearn
#importing dataset
from sklearn.datasets import load_boston
dataset = load_boston()
#features
print(dataset.feature_names)
#as independent and dependent variables
X = pd.DataFrame(dataset.data)
y = dataset.target
#finding missing values
print(X.isnull().sum())
#train test splitting
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X, y, test_size=0.2, random_state=0)
#model creation
from sklearn.tree import DecisionTreeRegressor
dtr = DecisionTreeRegressor()
dtr.fit(X_train,y_train)
#prediction
pred_price = dtr.predict(X_test)
Now taking look at one more practice problem in decision tree regression you might have got a better understanding than before.Check this algorithm and try to write code on your own . Dataset can be imported using the algorithm and no need of downloading them.Try working out...
KEEP READING !!!
0 Comments