## IntroductionIn my last post, we walked through the construction of a two-layer neural network and used it to classify the MNIST dataset. Today, I will show how we can build a logistic regression model from scratch (spoiler: it’s much simpler than a neural network). Linear and logistic regression are probably the simplest yet useful models in a lot of fields. They are fast to implement and also easy to interpret. In this post, I will first talk about the basics of logistic regression, followed by model construction and training on my NBA shot log dataset, then I will try to interpret the model through statistical inference. Finally, I will compare it with the built-in ## Logistic regressionLogistic regression is a generalized linear model, with a binominal distribution and logit link function. The outcome \(Y\) is either 1 or 0. What we are interested in is the expected values of \(Y\), \(E(Y)\). In this case, they can also be thought as probability of getting 1, \(p\). However, because \(p\) is bounded between 0 and 1, it’s hard to implement the method similar to what we did for linear regression. So instead of predicting \(p\) directly, we predict the log of odds (logit), which takes values from \(-\infty\) to \(\infty\). Now the function is: \(\log(\frac{p}{1-p})=\theta_0 + \theta_1x_1 + \theta_2x_2 + ...\), let’s denote the RHS as \(x\cdot\theta\). Next the task is to find \(\theta\). In logistic regresion, the cost function is defined as: \(J=-\frac{1}{m}\sum_{i=1}^m(y^{(i)}\log(h(x^{(i)}))+(1-y^{(i)})\log(1-h(x^{(i)})))\), where \(h(x)=\frac{1}{1+e^{-x\cdot\theta}}\) is the sigmoid function, inverse of logit function. We can use gradient descent to find the optimal \(\theta\) that minimizes \(J\). So this is basically the process to construct the model. It is actually simpler than you think when you starting coding. ## Model construction in RNow let’s build our logistic regression. First I will define some useful functions. Note
Here comes the logistic regression fuction, which takes training dataframe X, and label y as function input. It returns a column vector which stores the coefficients in theta. One thing to pay attention to is that the input X usually doesn’t have a bias term, the leading column vector of 1, so I added this column in the function.
## Training with NBA shot log datasetNow let’s train our model with the NBA shot log dataset. Specifically, I am interested in how will shot clock, shot distance and defender distance affect shooting performance. Naively, we would think
How do we interpret the model? -
The first number is the intercept. It is the log odds of a FG if all other predictors are 0. Note if log odds is 0, the probality is 0.5. So the negative intercept means <50%. -
The next three numbers are the coefficients for SHOT_CLOCK, SHOT_DIST, CLOSE_DEF_DIST. For every unit increase in the predictor, the coefficient is the change of log odds while holding other predictors to be constant. -
For example, let’s look at the last number. While holding others the same, if the defender moves 1 ft farther away, the log odds of that shot will increase by 0.106. -
If the original FG% is 50%, the new FG% will be 52.6% if the defender is 1 ft farther.
Now, look at the signs of the coefficients, we can conclude that increase in SHOT_CLOCK, CLOSE_DEF_DIST and decrease in SHOT_DIST will all have positive impact in FG%. Next, let’s compare our self-built model with the
We did a pretty good job as the coefficient are almost identical to 3rd decimal place. ## Prediction function and the Expected FG%Finally, I will write a prediction function that will output the probability of getting 1 in a logistic regression.
Generate a new data grid to see how FG% changes with predictors.
## Visulize the impactShot clock seems to have the least impact, so I will exclude that in this plot.
Indeed, wide open shots in the paint have the highest probability and contested long 3s have the lowest. This plot conveys very similar information as the one I did in my shiny app. However, doing regression smoothens things out (regression to mean?) and losses some important features. For example, the predictions of extreme cases (shot distance < 5ft or > 35ft) are all less drastic than what the reality is. Adding higher order terms will be a way to improve. ## ConclusionThere you have it, it is not that hard for ourselves to build a regression model from scratch (as we also demonstrated in neural network). If you follow this post, hopefully by now, you have a better understanding of logistic regression. One last note, although logistic regression is often said to be a classifier, it can also be used for regression (to find the probability).
0 Comments
## Leave a Reply. |
## AuthorA mechanical engineer who also loves data. ## Archives
January 2018
## Categories## Blogs I enjoy reading |