Now we are ready to find out the partial derivative:Įventhough Logistic Regression was created to solve binary classification problems it can also be used for more than two classes. The derivative of the sigmoid function is quite easy to calulcate using the quotient rule. Deriving the Gradient Descent formula for Logistic Regression (Optional)įirst we need to calculate the derivative of the sigmoid function. Notice that the result is identical to the one of Linear Regression. This is quite involved therefore I will show you the result first and you can skip the process of getting to the result if you like. We can get the gradient descent formula for Logistic Regression by taking the derivative of the loss function. Remember the general form of gradient descent looks like: There are more sophisticated optimization algorithms out there such as Adam but we won't worry about those in this article. To find the coefficients (weights) that minimize the loss function we will use Gradient Descent. One the other hand if y is equal to 0 the first term will be zero and therefore will not affect the loss. Notice that when y is equal to 1 the second term will be zero and therefore will not affect the loss. To make it easier to work with the loss function we can compress the two conditional cases into one equation: If the prediction approaches 0, then the cost function will approach infinity. If the correct answer is 1, then the cost function will be 0 if the prediction is 1. If the prediction approaches 1, then the cost function will approach infinity. That means that if the correct answer is 0, then the cost function will be 0 if the prediction is also 0. The above graph shows that the further away the prediction is from the actual y value the bigger the loss gets. Instead, we will use the following loss function for logistic regression:Īt first glance, the functions look complex but when visualized they are quite easy to grasp. Generally, the decision boundary is 0.5 so that if the output is >=0.5 we get class 1, else class 0.įor Logistic Regression we can't use the same loss function as for Linear Regression because the Logistic Function (Sigmoid Function) will cause the output to be non-convex, which will cause many local optima. The decision boundary specifies how high the probability must be so that we have an output of 1. To get a discrete class value (either 0 or 1) a decision boundary must be chosen. Mathematically the sigmoid function looks like: All inputs less than 0 produce an output between less than 0.5. All input values greater than 0 produce an output greater than 0.5. It squeezes the output of the linear function Z between 0 and 1. This is where the sigmoid function comes in. That's fine when working on a regression task but for binary classification, the output needs to be a probability value. That means that the output of the model could range from -∞ to ∞. Without the Sigmoid function, Logistic Regression would just be Linear Regression. It achieves this by passing the input through a linear function and then transforming the output to a probability value with the help of a sigmoid function. Logistic Regression is a statistical method that was designed to solve binary classification problems. The difference between the two is that in Regression we are predicting a continuous number like the price of a house or the temperature for the next day whilst in Classification, we are predicting discrete values like if a patient has or doesn't have a heart disease. Supervised Machine Learning can be split into two subcategories – Regression and Classification. Difference between Regression and Classification Logistic Regression is a classical statistical model, which has been widely used in academia and industry to solve binary classification problems.