Ticker

6/recent/ticker-posts

Polynomial Regression - A Practice Problem

bv

                As discussed earlier,Regression is one method used most commonly used in applications and of course is wide to explain.In this section of regression algorithms we will encounter polynomial regression practice problem to make you yet familiar with these techniques in simple manner.

    Polynomial regression is a form of regression in which the  relation between independent and dependent variable is modeled as an nth degree of polynomial x. This is also called polynomial linear regression. This is called linear because the linearity is with the coefficients of x.
The equation for polynomial regression is as follows


Let's go through an example.

    In this Polynomial regression we are using Position_salaries dataset (download dataset). It is a data set of salaries of different positions in a company . It's a small and perfect data set for a  beginner. So let's start our implementation .
Have a look at the dataset.



So let's implement our using python. First of all import all essential libraries for preprocessing and visualization.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn

Next we have to import the dataset and slice it into independent(X) and dependent variables(y)
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values

Now let's visualize the data and analyse it using matplotlib package
plt.scatter(X,y)
plt.style.use('dark_background')
plt.ylim(0,1200000)
plt.title("position_salary")
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()

The graph will be



By analyzing the graph we can realize that fitting a straight line is inefficient in this problem. So we have to use polynomial regression for achieving better accuracy.Since there is only limited amount of data we can avoid train test splitting .
For a reference we can do both simple linear regression and polynomial regression for our dataset . So first of all let's create simple linear regression model.
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)

Let's visualize our prediction.
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg.predict(X), color = 'blue')
plt.ylim(0,1200000)
plt.title('Linear Regression')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()

The graph will be


By looking the prediction it is crystal clear that our model is not good for predicting the salary .
It is the time to create our polynomial regression model.
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)

Actually here we are imparting polynomial features(with degree 4) to our dataset and then continuing the linear regression practice same that we have done in simple linear regression.Now let's visualize our prediction.
plt.scatter(X, y, color = 'red')
plt.ylim(0,1200000)
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
plt.title('Polynomial Regression')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()

The graph will be


By analyzing the graph we can arrive at a conclusion that our model is pretty good than the model created using simple linear regression algorithm. So let's predict with some value in both simple linear and polynomial regression model .

Prediction with simple linear regression model
lin_pred=lin_reg.predict([[7.5]])

Output is 411258

Prediction with Polynomial regression model
poly_pred=lin_reg_2.predict(poly_reg.fit_transform([[7.5]]))

Output is 225126

The output value of linear regression doesn't make any sense. But the out provided by the polynomial model is almost satisfies the condition

Full code:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values

#visualizing the data
plt.scatter(X,y)
plt.style.use('dark_background')
plt.ylim(0,1200000)
plt.title("position_salary")
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()

# Training the Linear Regression model on the whole dataset
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X, y)

# Training the Polynomial Regression model on the whole dataset
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)

# Visualising the Linear Regression results
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg.predict(X), color = 'blue')
plt.ylim(0,1200000)
plt.title('Linear Regression')
plt.xlabel('Position Level')
plt.ylabel('Salary')
plt.show()

# Visualising the Polynomial Regression results
plt.scatter(X, y, color = 'red')
plt.ylim(0,1200000)
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
plt.title('Polynomial Regression')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()


# Predicting a new result with Linear Regression
lin_pred=lin_reg.predict([[7.5]])

# Predicting a new result with Polynomial Regression
poly_pred=lin_reg_2.predict(poly_reg.fit_transform([[7.5]]))
    Hope you might have got better intuitions from this one , as well as our previous regression articles.We advice you to get familiar with techniques discussed earlier and also theory mentioned before to get better understandings.Do go through the methods of evaluating your model ,for which we have put up a post.And do not forget to workout these algorithms and check out the results yourselves.

Keep Reading!!!

Post a Comment

0 Comments