Classification Techniques in Machine Learning || Know about Logistic Regression

logistic regression classification

We have covered some of the important regression techniques in our regression series. Now we are starting our classification series. Classification is a type supervised learning in which the training datas are classified into certain categories using some algorithm and the prediction is done based on this classification. Classification models include linear models like Logistic Regression, SVM, and nonlinear ones like K-NN, Kernel SVM, Decision tree and Random Forests classification.It is a supervised learning. In today's article we are discussing about the first linear classification algorithm Logistic regression

Logistic Regression

Logistic regression is commonly used statistical algorithm for binary classification problems, that is ,it has only two output values.It usually explains the relationship between one categorically dependent variable and more than one independent variables.The algorithm was named so since it used the logistic function(sigmoid).This function basically takes in any real value and maps it to a value between 0 and 1.The result is a S-shaped curve.

y=1/(1+e^-x)

Sigmoid Graph

Basically in logistic regression ,the input values are taken to establish a linear relation(using some coefficients) and thus predict an output y, much similar to linear regression.But the final output is either 0 or 1.The predicting the output is like predicting the probability to fall into two categories like red/blue ,male/female ,pass/fail etc.. In all the problem we are setting a threshold value(0.5 or 50%) , all the value more than this threshold value will considered as 1 (or yes) and values less than this threshold are 0 (or No).

Logistic Regression has its applications in healthcare,marketing(like if a person purchased a product or not),to predict if a given process succeeds or fails. etc..

Example

In this example problem we are going to implement a logistic regression model for a dataset in order to get a clear idea behind this classification algorithm.

About dataset: We are using the Social network ad dataset (download) for this problem. The dataset contains the details of users in a social networking site to find whether a user buy a product by clicking the ad in the site based on their salary,age and gender.

Let's start the programming by importing essential libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn

Importing of dataset and slicing it into independent and dependent variables

dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1, 2, 3]].values
y = dataset.iloc[:, -1].values

Since our dataset containing character variables we have to encode it using LabelEncoder

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])

We are performing train test split on our dataset. We are providng the test size as 0.20, that means our training sample contains 320 training set and test sample contains 80 test set

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

Next, we are doing feature scaling to the training and test set of independent variables

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Next is the important step, Logistic regression model creation

 from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

Let's predict the test results

y_pred = classifier.predict(X_test)

Predicted and actual value -

y_pred y_test

For the first 8 values both are same.We can evaluate our matrix using confusion matrix and accuracy score by comparing the predicted and actual test values

from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)

confusion matrix -

ac - 0.9125

Accuracy is good. You can achieve better results for this problem using different algorithms, that will be discussed in the future lessons.

Full Code -

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1,2, 3]].values
y = dataset.iloc[:, -1].values

#encoding the data
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the Logistic Regression model on the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)
print(cm)
print(ac)

That was all about Logistic Regression ,one of the primary classification technique.Although the name contain regression now you might have understood that this one is a classification algorithm.Now you know to classify the datapoints into categories when a dataset is given.This may not always work well for all datasets.For complex problems we have other algorithms.In the upcoming posts we will see more classification algorithms.To get familiarize we suggest you to work out the problem.

Happy Reading!!!

Ticker

Classification Techniques in Machine Learning || Know about Logistic Regression

Post a Comment

0 Comments

Popular Posts

What is Machine Learning

What Do You Know About WEB DEVELOPMENT?

Support Vector Machine (SVM) - A Practice Problem || Machine Learning Series

Report Abuse

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget

Ticker

Classification Techniques in Machine Learning || Know about Logistic Regression

You may like these posts

Post a Comment

0 Comments

Popular Posts

What is Machine Learning

What Do You Know About WEB DEVELOPMENT?

Support Vector Machine (SVM) - A Practice Problem || Machine Learning Series

Report Abuse

Social Plugin

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget