Update: It appears to me that nba.com has removed the access to shot log data as of 02/10/2016. The scraped data is still available on my github.
It is hard to convincingly evaluate a player's contribution to team offense, given so many factors involved, both objective and subjective. I took a shot at this problem by analyzing what kind of shot a player takes and how's that compared with league average. The result, as we shall see later, is quite inline with what we would expect. This is part I of the 2-part series. Part I mainly focuses on the analysis, while part II on building the Shiny app, with a potential part III for case studies.
tl;dr: here is my final web app for this project. You can right click the plot and download it to create your own case study. Enjoy!
Data used in the analysis
I scraped each player's shot log from stat.nba.com for the last two seasons. The code for this task can be found in my previous post. The data consists of almost 300k shots in total and 400+ players each season. For each shot, it provides metrics like shot distance, shot clock, defender distance and the outcome of the shot etc. In this analysis, I will choose the two most important ones, shot distance and defender distance. They comprise an essential portion of a player's shot selection and have fundamental influence on the outcome of the shot.
Analysis using R
After the data is ready-tidy, the fun analysis begins with breaking down shot distance and defender distance. I divide shot distance into 8 catogories: 0-5 ft all the way up to 35+ with 5 ft interval and defender distance into 4: 0-2 ft up to 6+ with 2 ft interval.
One for loop for different defender distance nested inside another for loop for shot distance. The following function takes a single data.frame argument in the shot log format and returns the FG%, total FGA, total FGM Pts and 2pt/total for each shot distance and defender distance. So depending on the input data.frame, we can get either the league average or the data for an individual player. I used chaining to simplify the code and it worked really well. I tried to use ddply(), it worked in the main script, however not inside a function due to some scoping bug.
Once the function is constructed, we can first look at the league average shooting performance.
This plot has two parts: the semi-transparent bar for league average and solid color for the player of interest, in this case league average as well (I kept both for consitensy with later plots for individual players).
There is a clear trend that the closer to the basket, the better FG%. For the same range of shot distance, the farther the defender is, the better FG%. We've got a large enough sample size that we actually statistically proved it (except shot distance > 30 ft, where the sample size is limited)!
As for shot selection, about quarter of the total shots are inside 5 ft or between 20-25 ft (this is a choice of compromise. NBA has a varying 3pts distance from 22 to 24 ft, so it is difficult to use shot distance for 2pt/3pt indication. However, about 75% of the shots between 20-25 ft are 3 pointers. So this range is a fairly good indicator for close 3 pointers). These two types shots are easy shots, or easy shots-with-good-return. 8.5% are long 3 pointers and less than 1% total for desperation shots (30-35 ft and 35+ ft).
On the other hand, 46% of the shots are open (and 18% are wide open). Another 18% are closely contested while 37% are within normal defender distance 2-4 ft. By displaying an individual player's shot selection, we can see whether this player likes 3 pointers, prefers open shots and etc.
Speaking of season to season variation, there is very little difference between the last two seasons (although this seanson I only gathered about 1/3 of the shots as last season). You can take a look by yourself.
Player value added to team offense
Now let me try to answer the question I posted at the beginning of this article. The way I look at it is that I trust a player's shot selection under circumstances on the court (after all, too many bad shots would get you benched). However, the player should be responsible for the shot choice he took: if he ouperformes league average on a contested long 3, he has added positive value to team offense. If the player misses a wide open lay up, he has added a much negative value because wide open lay ups have very high points expectation. With that in mind, here is my take on player value:
1, obtain the league average FG% (as in previous section)
2, for each player, calculate the expected Pts (equals FGA * average FG%) and actual Pts scored at each location each defender distance
3, take the difference between these two, the more positive the difference is, the more value that player added to team offense from a scoring point of view.
As we can see from this plot, Curry has added the most value to his team while also taken a tremendous amount of shots. In fact, he added almost 5 points per game to Warriors' offense. KD takes less shots, but he is actually more efficient than Curry per shot attempt. Damian Lillard and James Harden take most shots in this season and they are about average. In this analysis, I didn't take free throws into account. As a result, players like James Harden are under-appreciated.
For those who have negative impact to their team, we see Mudiay and Rose leading the way. If you take a look their FG% break down, it is mostly because they couldn't finish contested shots in the paint and open 3 pointers. Notably, Kobe is also in the bottom 5.
All right, there you have it. I will put another article on how to construct the shiny app during this holiday season. Until then, have fun comparing your favourite player with the rest!