We have covered some of the important regression techniques in our regression series. Now we are starting our classification series. Classification is a type supervised learning in which the training datas are classified into certain categories using some algorithm and the prediction is done based on this classification. Classification models include linear models like Logistic Regression, SVM, and nonlinear ones like K-NN, Kernel SVM, Decision tree and Random Forests classification.It is a supervised learning. In today's article we are discussing about the first linear classification algorithm Logistic regression
Logistic Regression
Logistic regression is commonly used statistical algorithm for binary classification problems, that is ,it has only two output values.It usually explains the relationship between one categorically dependent variable and more than one independent variables.The algorithm was named so since it used the logistic function(sigmoid).This function basically takes in any real value and maps it to a value between 0 and 1.The result is a S-shaped curve.
y=1/(1+e^-x)
Sigmoid Graph
Basically in logistic regression ,the input values are taken to establish a linear relation(using some coefficients) and thus predict an output y, much similar to linear regression.But the final output is either 0 or 1.The predicting the output is like predicting the probability to fall into two categories like red/blue ,male/female ,pass/fail etc.. In all the problem we are setting a threshold value(0.5 or 50%) , all the value more than this threshold value will considered as 1 (or yes) and values less than this threshold are 0 (or No).
Logistic Regression has its applications in healthcare,marketing(like if a person purchased a product or not),to predict if a given process succeeds or fails. etc..
import numpy as np import matplotlib.pyplot as plt import pandas as pd import sklearn
dataset = pd.read_csv('Social_Network_Ads.csv') X = dataset.iloc[:, [1, 2, 3]].values y = dataset.iloc[:, -1].values
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() X[:,0] = le.fit_transform(X[:,0])
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test)
from sklearn.linear_model import LogisticRegression classifier = LogisticRegression(random_state = 0) classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import confusion_matrix,accuracy_score cm = confusion_matrix(y_test, y_pred) ac = accuracy_score(y_test,y_pred)
# Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd import sklearn # Importing the dataset dataset = pd.read_csv('Social_Network_Ads.csv') X = dataset.iloc[:, [1,2, 3]].values y = dataset.iloc[:, -1].values #encoding the data from sklearn.preprocessing import LabelEncoder le = LabelEncoder() X[:,0] = le.fit_transform(X[:,0]) # Splitting the dataset into the Training set and Test set from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0) # Feature Scaling from sklearn.preprocessing import StandardScaler sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.transform(X_test) # Training the Logistic Regression model on the Training set from sklearn.linear_model import LogisticRegression classifier = LogisticRegression(random_state = 0) classifier.fit(X_train, y_train) # Predicting the Test set results y_pred = classifier.predict(X_test) # Making the Confusion Matrix from sklearn.metrics import confusion_matrix,accuracy_score cm = confusion_matrix(y_test, y_pred) ac = accuracy_score(y_test,y_pred) print(cm) print(ac)
0 Comments