Ticker

6/recent/ticker-posts

Simple Linear Regression - Practice problem

     
MacBook Pro showing programming language

           We are already discussed about different types of regression techniques that are commonly used in the machine learning in previous article.In this article we are going to discuss briefly about the Simple linear regression
and go through a problem statement.

Simple Linear Regression

    Simple linear regression are commonly  used when we have to create a model with one independent and one dependent variable.The aim is to establish a linear relation and predict outputs accurately assuming that the output solely depends on the feature we have chosen.Simple linear regression technique is convenient to use only for relatively datasets with less complexity and less features.

                                                           y=m*x+c

is the linear equation. To think of graphically, we find a best fit line on the data point plotted in the graph with independent variable on x axis and dependent variable in the y-axis.

Lets go through an example
     The given data set is the salary data set (download Salary dataset).It contains two columns , year of experience and salary of different employees.Our aim is to create a model which predict the salary of employee based on the year of experience.Since it contains one independent and one dependent variable we can use simple linear regression for this problem.




As usual the first step involved is importing libraries data sets and split it as independent variable,which is the year of experience  and dependent variable , which is the salary as X and y respectively.Simple logic applied here to select the independent and dependent variables.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

    Salary of  employees are high, this high value can affect the accuracy of the model so we need to perform  feature scaling in the independent variable and as a result the salary range will be reduced
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
x=sc.fit_transform(X)
 
  The next step is to split the X and y into training set and test set. For this process we are importing the train_test_split from the model_selection module of sklearn. Test_size is taken as .30, which return a test set and training set of size 30% and 70%  of data set  respectively.Here, random_state is taken as 0 because when each time you run this code you will get the same output ,if you use any number other than zero your output will be changes in each run 

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.3 , random_state = 0)
    Now we have to create our simple linear regression model. For this we are using the linear regression class
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Our model is created in accordance with the given data set .Now we have to predict outputs of the test set and analyze whether the model is performing good
y_predicted = regressor.predict(X_test)

Predicted and Actual value:-

                                   
            
                               y_predicted                                                     y_test

    We can compare the predicted and actual value, there is some fluctuations in the predicted value it is because of the lack of number of data

Training data visualization:-

    We are using matplotlib library for the data visualization
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience ')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Graph


Test set visualization:-
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience ')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Graph


Full Code:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

#feature scaling
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
x=sc.fit_transform(X)

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.3 , random_state = 0)

# Training the Simple Linear Regression model on the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the Test set results
y_pred = regressor.predict(X_test)

# Visualising the Training set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience ')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

# Visualising the Test set results
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience ')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Hoping you might have got the intuition of performing a whole ML algorithm in simple linear regression.In the upcoming Post of ML series we will be discussing to execute the Multi linear regression algorithm.Try to use better IDE for carrying out the algorithms.Anaconda package provides better experience and supports Python programming.Either Jupyter or spider will do.Practice by yourself Check your outputs.

Post a Comment

3 Comments