Simple Linear Regression

MacBook Pro showing programming language

We are already discussed about different types of regression techniques that are commonly used in the machine learning in previous article.In this article we are going to discuss briefly about the Simple linear regression

and go through a problem statement.

Simple linear regression are commonly used when we have to create a model with one independent and one dependent variable.The aim is to establish a linear relation and predict outputs accurately assuming that the output solely depends on the feature we have chosen.Simple linear regression technique is convenient to use only for relatively datasets with less complexity and less features.

y=m*x+c

is the linear equation. To think of graphically, we find a best fit line on the data point plotted in the graph with independent variable on x axis and dependent variable in the y-axis.

Lets go through an example

The given data set is the salary data set (download Salary dataset).It contains two columns , year of experience and salary of different employees.Our aim is to create a model which predict the salary of employee based on the year of experience.Since it contains one independent and one dependent variable we can use simple linear regression for this problem.

As usual the first step involved is importing libraries data sets and split it as independent variable,which is the year of experience and dependent variable , which is the salary as X and y respectively.Simple logic applied here to select the independent and dependent variables.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

Salary of employees are high, this high value can affect the accuracy of the model so we need to perform feature scaling in the independent variable and as a result the salary range will be reduced

from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
x=sc.fit_transform(X)

The next step is to split the X and y into training set and test set. For this process we are importing the train_test_split from the model_selection module of sklearn. Test_size is taken as .30, which return a test set and training set of size 30% and 70% of data set respectively.Here, random_state is taken as 0 because when each time you run this code you will get the same output ,if you use any number other than zero your output will be changes in each run

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.3 , random_state = 0)

Now we have to create our simple linear regression model. For this we are using the linear regression class

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

Our model is created in accordance with the given data set .Now we have to predict outputs of the test set and analyze whether the model is performing good

y_predicted = regressor.predict(X_test)

Predicted and Actual value:-

y_predicted y_test

We can compare the predicted and actual value, there is some fluctuations in the predicted value it is because of the lack of number of data

Training data visualization:-

We are using matplotlib library for the data visualization

plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience ')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Graph

Test set visualization:-

plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience ')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Graph

Full Code:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

#feature scaling
from sklearn.preprocessing import StandardScaler
sc=StandardScaler()
x=sc.fit_transform(X)

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =0.3 , random_state = 0)

# Training the Simple Linear Regression model on the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the Test set results
y_pred = regressor.predict(X_test)

# Visualising the Training set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience ')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

# Visualising the Test set results
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience ')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

Hoping you might have got the intuition of performing a whole ML algorithm in simple linear regression.In the upcoming Post of ML series we will be discussing to execute the Multi linear regression algorithm.Try to use better IDE for carrying out the algorithms.Anaconda package provides better experience and supports Python programming.Either Jupyter or spider will do.Practice by yourself Check your outputs.

Ticker

Simple Linear Regression - Practice problem

Simple Linear Regression

Post a Comment

3 Comments

Popular Posts

What is Machine Learning

What Do You Know About WEB DEVELOPMENT?

Support Vector Machine (SVM) - A Practice Problem || Machine Learning Series

Report Abuse

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget

Ticker

Simple Linear Regression - Practice problem

Simple Linear Regression

You may like these posts

Post a Comment

3 Comments

Popular Posts

What is Machine Learning

What Do You Know About WEB DEVELOPMENT?

Support Vector Machine (SVM) - A Practice Problem || Machine Learning Series

Report Abuse

Social Plugin

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget