Ticker

6/recent/ticker-posts

Classification Techniques in Machine Learning || Know about Logistic Regression

  logistic regression classification

  We have covered some of the important regression techniques in our regression series. Now we are starting our classification series. Classification is a type supervised learning in which the training datas are classified into certain categories using some algorithm and the prediction is done based on this classification. Classification models include linear models like Logistic Regression, SVM, and nonlinear ones like  K-NN, Kernel SVM, Decision tree and  Random Forests classification.It is a supervised learning. In today's article we are discussing about the first linear classification algorithm Logistic regression

Logistic Regression

    Logistic regression is commonly used statistical algorithm for binary classification problems, that is ,it has only two output values.It usually explains the relationship between one categorically dependent variable and more than one independent variables.The algorithm was named so since it used the logistic function(sigmoid).This function basically takes in any real value and maps it to a value between 0 and 1.The result is a S-shaped curve.

y=1/(1+e^-x)

Sigmoid Graph 

    Basically in logistic regression ,the input values are taken to establish a linear relation(using some coefficients) and thus predict an output y, much similar to linear regression.But the final output is either 0 or 1.The predicting the output is like predicting the probability to fall into two categories like red/blue ,male/female ,pass/fail etc.. In all the problem we are setting  a threshold value(0.5 or 50%) , all the value more than this threshold value will considered as 1 (or yes) and values less than this threshold are 0 (or No).

Logistic Regression has its applications in healthcare,marketing(like if a person purchased a product or not),to predict if a given process succeeds or fails. etc..

Example

    In this example problem we are going to implement a logistic regression model for a dataset  in order to get a clear idea behind this classification algorithm.
About dataset: We are using the Social network ad dataset (download) for this problem. The dataset contains the details of users in a social networking site to find whether a user buy a product by clicking the ad in the site based on their salary,age and gender.
Let's start the programming by importing essential libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn
Importing of dataset and slicing it into independent and dependent variables
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1, 2, 3]].values
y = dataset.iloc[:, -1].values
Since our dataset containing character variables we have to encode it using LabelEncoder
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])
We are performing train test split on our dataset. We are providng the test size as 0.20, that means our training sample contains 320 training set and test sample contains 80 test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
Next, we are doing feature scaling to the training and test set of independent variables
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
Next is the important step, Logistic regression model creation
 from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
Let's predict the test results
y_pred = classifier.predict(X_test)
Predicted and actual value - 

                                   

                                    y_pred                                                    y_test


For the first 8 values both are same.We can evaluate our matrix using confusion matrix and accuracy score by comparing the predicted and actual test values
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)
confusion matrix - 

                

ac  -  0.9125

Accuracy is good. You can achieve better results for this problem using different algorithms, that will be discussed in the future lessons.

Full Code -
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import sklearn

# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [1,2, 3]].values
y = dataset.iloc[:, -1].values

#encoding the data
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Training the Logistic Regression model on the Training set
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix,accuracy_score
cm = confusion_matrix(y_test, y_pred)
ac = accuracy_score(y_test,y_pred)
print(cm)
print(ac)


That was all about Logistic Regression ,one of the primary classification technique.Although the name contain regression now you might have understood that this one is a classification algorithm.Now you know to classify the datapoints into categories when a dataset is given.This may not always work well for all datasets.For complex problems we have other algorithms.In the upcoming posts we will see more classification algorithms.To get familiarize we suggest you to work out the problem.

Happy Reading!!!

Post a Comment

0 Comments