Hockey Analytics, Strategy, & Game Theory

Strategic Snapshot: Isolating QREAM

I’ve recently attempted to measure goaltending performance by looking at the number of expected goals a goaltender faces compared to the actual goals they actually allow. Expected goals are ‘probabilitistic goals’ based on what we have data for (which isn’t everything): if that shot were taken 1,000 times on the average goalie that made the NHL, how often would it be a goal? Looking at one shot there is variance, the puck either goes in or doesn’t, but over a course of a season summing the expected goals gives a little better idea of how the goaltender is performing because we can adjust for the quality of shots they face, helping isolate their ‘skill’ in making saves. The metric, which I’ll refer to as QREAM (Quality Rules Everything Around Me), reflects goaltender puck-saving skill more than raw save percentage, showing more stability within goalie season.
Goalies doing the splits
Good stuff. We can then use QREAM to break down goalie performance by situations, tactical or circumstantial, to reveal actionable trends. Is goalie A better on shots from the left side or right side? Left shooters or right shooters? Wrist shots, deflections, etc? Powerplay? Powerplay, left or right side? etc. We can even visualise it, and create a unique descriptive look at how each goaltender or team performed.

This is a great start. The next step in confirming the validity of a statistic is looking how it holds up over time. Is goalie B consistently weak on powerplay shots from the left side? Is something that can be exploited by looking at the data? Predictivity is important to validate a metric, showing that it can be acted up and some sort of result can be expected. Unfortunately, year over year trends by goalie don’t hold up in an actionable way. There might be a few persistent trends below, but nothing systemic we can that would be more prevalent than just luck. Why?

Game Theory (time for some)

In the QREAM example, predictivity is elusive because hockey is not static and all players and coaches in question are optimizers trying their best to generate or prevent goals at any time. Both teams are constantly making adjustments, sometimes strategically and unconsciously. As a data scientist, when I analyse 750,000 shots over 10 seasons, I only see what happened, not what didn’t happen. If in one season, goalie A underperformed the average on shots from the left shooters from the left side of the ice that would show up in the data, but it would be noticed by players and coaches quicker and in a much more meaningful and actionable way (maybe it was the result of hand placement, lack of squareness, cheating to the middle, defenders who let up cross-ice passes from right to left more often than expected, etc.) The goalie and defensive team would also pick up on these trends and understandably compensate, maybe even slightly over-compensate, which would open up other options attempting to score, which the goalie would adjust to, and so on until the game reaches some sort of multi-dimensional equilibrium (actual game theory). If a systemic trend did continue then there’s a good chance that that goalie will be out of the league. Either way, trying to capture a meaningful actionable insight from the analysis is much like trying to capture lightning in a bottle. In both cases, finding a reliable pattern in a game where both sides and constantly adjusting and counter-adjusting is very difficult.

This isn’t to say the analysis can’t be improved. My expected goal model has weaknesses and will always have limitations due to data and user error. That said, I would expect the insights of even a perfect model to be arbitraged away. More shockingly (since I haven’t looked at this in-depth, at all), I would expected the recent trend of NBA teams fading the use of mid-range shots to reverse in time as more teams counter that with personnel and tactics, then a smart team could probably exploit that set-up by employing slightly more mid-range shots, and so on, until a new equilibrium is reached. See you all at Sloan 2020.

Data On Ice

The role of analytics is to provide a new lens to look at problems and make better-informed decisions. There are plenty of example of applications at the hockey management level to support this, data analytics have aided draft strategy and roster composition. But bringing advanced analytics to on-ice strategy will likely continue to chase adjustments players and coaches are constantly making already. Even macro-analysis can be difficult once the underlying inputs are considered.
An analyst might look at strategies to enter the offensive zone, where you can either forfeit control (dump it in) or attempt to maintain control (carry or pass it in). If you watched a sizable sample of games across all teams and a few different seasons, you would probably find that you were more likely to score a goal if you tried to pass or carry the puck into the offensive zone than if you dumped it. Actionable insight! However, none of these plays occurs in a vacuum – a true A/B test would have the offensive players randomise between dumping it in and carrying it. But the offensive player doesn’t randomise, they are making what they believe to be the right play at that time considering things like offensive support, defensive pressure, and shift length of them and their teammates. In general, when they dump the puck, they are probably trying to make a poor position slightly less bad and get off the ice. A randomised attempted carry-in might be stopped and result in a transition play against. So, the insight of not dumping the puck should be changed to ‘have the 5-player unit be in a position to carry the puck into the offensive zone,’ which encompasses more than a dump/carry strategy. In that case, this isn’t really an actionable, data-driven strategy, rather an observation. A player who dumps the puck more often likely does so because they struggle to generate speed and possession from the defensive zone, something that would probably be reflected in other macro-stats (i.e. the share of shots or goals they are on the ice for). The real insight is the player probably has some deficiencies in their game. And this where the underlying complexity of hockey begins to grate at macro-measures of hockey analysis, there’s many little games within the games, player-level optimisation, and second-order effects that make capturing true actionable, data-driven insight difficult.[1]
It can be done, though in a round-about way. Like many, I support the idea of using (more specifically, testing) 4 or even 5 forwards on the powerplay. However, it’s important to remember that analysis that shows a 4F powerplay is more of a representation of the team’s personnel that elect to use that strategy, rather than the effectiveness of that particular strategy in a vacuum. And team’s will work to counter by maximising their chance of getting the puck and attacking the forward on defence by increasing aggressiveness, which may be countered by a second defenseman, and so forth.

Game Theory (revisited & evolved)

Where analytics looks to build strategic insights on a foundation of shifting sand, there’s an equally interesting force at work – evolutionary game theory. Let’s go back to the example of the number of forwards employed on the powerplay, teams can use 3, 4, or 5 forwards. In game theory, we look for a dominant strategy first. While self-selected 4 forward powerplays are more effective a team shouldn’t necessarily employ it if up by 2 goals in the 3rd period, since a marginal goal for is worth less than a marginal goal against. And because 4 forward powerplays, intuitively, are more likely to concede chances and goals against than 3F-2D, it’s not a dominant strategy. Neither are 3F-2D or 5F-0D.
Thought experiment. Imagine in the first season, every team employed 3F-2D. In season 2, one team employs a 4F-1D powerplay, 70% of the time, they would have some marginal success because the rest of the league is configured to oppose 3F-2D, and in season 3 this strategy replicates, more teams run a 4F-1D in line with evolutionary game theory. Eventually, say in season 10, more teams might run a 4F-1D powerplay than 3F-2D, and some even 5F-0D. However, penalty kills will also adjust to counter-balance and the game will continue. There may or may not be an evolutionarily stable strategy where teams are best served are best mixing strategies like you would playing rock-paper-scissors.[2] I imagine the proper strategy would depend on score state (primarily), and respective personnel.
You can imagine a similar game representing the function of the first forward in on the forecheck. They can go for the puck or hit the defensemen – always going for the puck would let the defenseman become too comfortable, letting them make more effective plays, while always hitting would take them out of the play too often, conceding too much ice after a simple pass. The optimal strategy is likely randomising, say, hitting 20% of the time factoring in gap, score, personnel, etc.

A More Robust (& Strategic) Approach

Even if it seems a purely analytic-driven strategy is difficult to conceive, there is an opportunity to take advantage of this knowledge. Time is a more robust test of on-ice strategies than p-values. Good strategies will survive and replicate, poor ones will (eventually and painfully) die off. Innovative ideas can be sourced from anywhere and employed in minor-pro affiliates where the strategies effects can be quantified in a more controlled environment. Each organisation has hundreds of games a year in their control and can observe many more. Understanding that building an analytical case for a strategy may be difficult (coaches are normally sceptical of data, maybe intuitively for the reasons above), analysts can sell the merit of experimenting and measuring, giving the coach major ownership of what is tested. After all, it pays to be first in a dynamic game such as hockey. Bobby Orr changed the way the blueliners played. New blocking tactics (and equipment) lead to improved goaltending. Hall-of-Fame forward Sergei Fedorov was a terrific defenseman on some of the best teams of the modern era.[3]  Teams will benefit from being the first to employ (good) strategies that other teams don’t see consistently and don’t devote considerable time preparing for.
The game can also improve using this framework. If leagues want to encourage goal scoring, they should encourage new tactics by incentivising goals. I would argue that the best and most sustainable way to increase goal scoring would be to award AHL teams 3 points for scoring 5 goals in a win. This will encourage offensive innovation and heuristics that would eventually filter up to the NHL level. Smaller equipment or big nets are susceptible to second order effects. For example, good teams may slow down the game when leading (since the value of a marginal goal for is now worth less than a marginal goal against) making the on-ice even less exciting. Incentives and innovation work better than micro-managing.

In Sum

The primary role of analytics in sport and business is to deliver actionable insights using the tools are their disposal, whether is statistics, math, logic, or whatever. With current data, it is easier for analysts to observe results than to formulate superior on-ice strategies. Instead of struggling to capture the effect of strategy in biased data, they should be using this to their advantage and look at these opportunities through the prism of game theory: testing and measuring and let the best strategies bubble to the top. Even the best analysis might fail to pick up on some second order effect, but thousands of shifts are less likely to be fooled. The data is too limited in many ways to create paint the complete picture. A great analogy came from football (soccer) analyst Marek Kwiatkowski:

Almost the entire conceptual arsenal that we use today to describe and study football consists of on-the-ball event types, that is to say it maps directly to raw data. We speak of “tackles” and “aerial duels” and “big chances” without pausing to consider whether they are the appropriate unit of analysis. I believe that they are not. That is not to say that the events are not real; but they are merely side effects of a complex and fluid process that is football, and in isolation carry little information about its true nature. To focus on them then is to watch the train passing by looking at the sparks it sets off on the rails.

Hopefully, there will soon be a time where every event is recorded, and in-depth analysis can capture everything necessary to isolate things like specific goalie weaknesses, optimal powerplay strategy, or best practices on the forecheck. Until then there are underlying forces at work that will escape the detection. But it’s not all bad news, the best strategy is to innovate and measure. This may not be groundbreaking to the many innovative hockey coaches out there but can help focus the smart analyst, delivering something actionable.

____________________________________________

 

[1] Is hockey a simple or complex system? When I think about hockey and how to best measure it, this is a troubling question I keep coming back to. A simple system has a modest amount of interacting components and they have clear relationships to other components: say, when you are trailing in a game, you are more likely to out-shoot the other team than you would otherwise. A complex system has a large number of interacting pieces that may combine to make these relationships non-linear and difficult to model or quantify. Say, when you are trailing the pressure you generate will be a function of time left in the game, respective coaching strategies, respective talent gaps, whether the home team is line matching (presumably to their favor), in-game injuries or penalties (permanent or temporary), whether one or both teams are playing on short rest, cumulative impact of physical play against each team, ice conditions, and so on.

Fortunately, statistics are such a powerful tool because a lot of these micro-variables even out over the course of the season, or possibly the game to become net neutral. Students learning about gravitational force don’t need to worry about molecular forces within an object, the system (e.g. block sliding on an incline slope) can separate from the complex and be simplified. Making the right simplifying assumptions we can do the same in hockey, but do so at the risk of losing important information. More convincingly, we can also attempt to build out the entire state-space (e.g different combinations of players on the ice) and using machine learning to find patterns within the features and winning hockey games. This is likely being leveraged internally by teams (who can generate additional data) and/or professional gamblers. However, with machine learning techniques applied there appeared to be a theoretical upper bound of single game prediction, only about 62%. The rest, presumably, is luck. Even if this upper-bound softens with more data, such as biometrics and player tracking, prediction in hockey will still be difficult.

It seems to me that hockey is suspended somewhere between the simple and the complex. On the surface, there’s a veneer of simplicity and familiarity, but perhaps there’s much going on underneath the surface that is important but can’t be quantified properly. On a scale from simple to complex, I think hockey is closer to complex than simple, but not as complex as the stock market, for example, where upside and downside are theoretically unlimited and not bound by the rules of a game or a set amount of time. A hockey game may be 60 on a scale of 0 (simple) to 100 (complex).

[2] Spoiler alert: if you performing the same thought experiment with rock-paper-scissors you arrive at the right answer –  randomise between all 3, each 1/3 of the time – unless you are a master of psychology and can read those around you. This obviously has a closed form solution, but I like visuals better:

[3] This likely speaks more to personnel than tactical, Fedorov could be been peerless. However, I think to football where position changes are more common, i.e. a forgettable college receiver at Stanford switched to defence halfway through his college career and became a top player in the NFL league, Richard Sherman. Julian Edelman was a college quarterback and now a top receiver on the Super Bowl champions. Test and measure.

CrowdScout Score and Salary – A Study in Market Value

It’s All Relative

In a salary cap league, how teams spend their finite budget has become very important to any present or future success.[1] The relative value of a contract is often more important than the absolute value of the contract. Within a very strict set of contract rules, teams will devote a share of their allotted cap space to a player at a price dependent on a number of market forces. The goal of this study is to determine what that price should be considering some of those market forces to compare to the actual salary.

So, how do we go about determining the market rate?[2] First, it helps to make some simplifying assumptions – we expect the cap-hit or AAV (Annual Average Value of the contract) to probably be a function of:

  • Position – different positions are valued slightly differently. Any contract negotiation anchor would consist of comparables playing the same position.
  • Age – the NHL’s not-so-free labor market puts significant restrictions and limitations on young player’s earnings. Thus, any analysis looking at market rate should factor in age.
  • Skill / ability / comprehensive contribution to winning – the player’s perceived ability will determine market value. Unlike age and position, skill is extremely difficult to accurately gauge and forecast (since many deals are multi-year). This will pose the biggest obstacle to a clean quantitative analysis. Across all sports, teams consistently misvalue player ability, most notoriously over-valuing their ability and overpaying them.
  • Contract Length (Term) – There are different interactions between age, term, and AAV. A short contract length might signal less money (a ‘show me’ bridge contract) for a young RFA or more money (player trading longer term for higher AAV) for an older UFA. Data courtesy of generalfanager.com.
  • Projected Salary Cap at Contract Date – A $5M AAV contract signed in the summer of 2009 is not the same as a contract signed in the summer of 2016. Managers are forward-looking allocating a set percentage of their expected salary cap to a player rather than an absolute amount. Data courtesy of generalfanager.com.

Finding Value

To determine how each player cap-hit stacks up against what we would expect, we must create a formula or algorithm to return each player’s expected AAV. Finding the difference between the expected AAV and actual AAV – or residual – would signal the relative value of their cap-hit. Spending a million less than market forces would expect (or, more specifically, our model would predict) allows the team to allocate to either save money or invest it elsewhere.

A model can be built using the features discussed above, predicting AAV as a function of age, position, and ability – the catch-all for talent or skill or whatever. But how do we comprehensively quantify ability, the age old question?

One Feature to Rule Them All

My baseline method will be to use GAR (Goals Above Replacement) from war-on-ice.com to help predict salary. GAR is a notable attempt to assign numerical credit to players based on their team winning, which proves a decent proxy for ability. However, GAR or any ‘be all, end all’ stat has limitations – injuries interrupt accumulation of goals above replacement and defensive contributions are very difficult to quantify, among other things. No algorithm is omnipotent, but GAR is a very helpful attempting to answer this question.

In addition to GAR, I will use data collected from my project, CrowdScout Sports, designed to smartly aggregate user judgment. It has been in beta over the course of the 2015-16 season with over 100 users making over 32,000 judgments on players relative to each other. With advanced metrics provided, a diversity of users, and the best forecasters gaining influence, I hope the data provides an increasingly reliable comprehensive player rating metrics. The rating is intended to answer the question posed to the user as they are prompted to rank two randomly chosen players – if the season started today, which player would you choose if the goal were to win a championship.[3] 

Both metrics will be used as a proxy for ability when trying to explain AAV, data courtesy of generalfanager.com. Both metrics are designed not to be influenced by cap-hit, a necessity for the model to properly to explain cap-hit.

GAR Linear Model

First, let’s explore the relationship between AAV and term, salary cap expectations, position, age, and ability using the GAR metric. Using 2014-2015 data[4] from war-on-ice.com and using their GAR model, a dataset containing player features at the onset of 2014-15 season was assembled. The AAV of the upcoming 2015-16 season (where the player was signed prior to the season) was targeted. Any incomplete records were removed. The age variable was transformed into a bucketed variable since there isn’t a linear relationship between age and AAV, rather different levels of pay by age. The natural bucketing of age in relation to cap-hit are:

  • 18-21 – Entry Level Contract (ELC) players
  • 22-24 – A mix of ELCs, bridge contracts, and a few high fliers who get paid
  • 25-27 – RFA controlled, second contract players in their early prime
  • 28-31 – UFA contract years (likely higher cap-hit) but players likely to still be in their prime
  • 32-35 – UFA contract years with some expected decline in ability
  • Over 35 – Declining ability compounded with specific contract rules for 35 plus players

The 924 remaining players were then split into 10 folds to cross-validate the Generalized Linear Model (GLM) – iteratively training on 90% of the data and testing out of sample on the remaining unseen 10% of data, then combining the 10 models. The cross-validated model is then used to score the original dataset – the coefficients from the GLM are multiplied by each player’s individual variables – age (1/0 for each bucket), position (1/0 for each position), contract length, projected cap, and GAR. The outcome is the expected AAV.

Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 836, 837, 838, 838, 838, 836, ...
Resampling results:


RMSE Rsquared
1.10514 0.7112609

Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.109087 0.838919 -3.706 0.0002***
GAR 0.066895 0.005919 11.301 < 2e-16***
age_group21-24` -0.022097 0.165447 -0.134 0.8938
age_group24-28` 0.473283 0.163387 2.897 0.0039**
age_group28-31` 0.812176 0.17113 4.746 0***
`age_group31-35` 1.078278 0.179754 5.999 0***
age_groupgt35 1.819195 0.21776 8.354 0***
PosD 0.129242 0.099992 1.293 0.1965
PosG 0.218529 0.140888 1.551 0.1212
PosW -0.112796 0.095704 -1.179 0.2389
Contract Length 0.673488 0.021353 31.541 < 2e-16***
Projected.Cap 0.041061 0.011842 3.467 0.0006***

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Our simple GLM explains about two-thirds of cap-hit. GAR, Contact Length, and Projected Cap are all a strong positive predictors. Each age bucket is subsequently paid more. Of note, the 22-24 age bucket is the weakest age coefficient since at that age some players are on their ELC while others have earned legitimate star contracts. In this model, position wasn’t a significant predictor, although it signals defensemen and goaltenders probably go at a premium to centers, while wingers take a discount.

The player-level residuals (expected AAV less actual AAV, a positive value representing surplus value to the team) are plotted below. The model would be stronger, but for some significant outliers – Jonathan Toews, Patrick Kane, Thomas Vanek, and Tyler Meyers were all paid about $4M more than the model expected. Conversely, Duncan Keith, Roberto Luongo, and Marian Hossa were all underpaid by at least an expected $4M. Like most linear models, it had trouble predicting a non-normal target. That is, the distribution of AAV values had a skew to the right, where the model struggled to pick up ‘extreme’ values. Transforming AAV into a log of AAV did not increase predictive power.

WAR.LM.Players

Crowd Wisdom

The next iteration of the GLM was run using the CrowdScout score as a proxy for ability. A few notes on the inclusion of this data:

  • What is this metric? It represents the relative strength of that player’s Elo rating compared to the entire population at the time of analysis. The Elo rating is the cumulative result of over 100 scouts selecting between two randomly generated (but generally similar) players some 32,000 times. Each of these selections feed into an algorithm that adjusted each player’s score based on the prior probability of the match-up and k-factor given to the user – the more active and accurate that user had been historically the greater their influence.
  • I think skepticism should be applied to any analysis performed on data acquired through some level of effort of the owner. That said, the CrowdScout data is the result of my own engineering project and is intended to aid (fantasy) managerial decision-making, rather than provide advanced analytical insight. Any clean, methodologically tight analysis would be a bonus.
  • There is a concern of collinearity in this analysis – since it is possible a subset of users associated higher salary with better ability, opposed to the reverse. Conversely, an obviously overpaid player can be under-rated due to an emotional discounting of their ability. For the purpose of this analysis, we will assume the effects neutralize each other and in aggregate AAV did not significantly impact the CrowdScout score.[5] There will obviously be a correlation between player score and AAV, but that does not imply causation.

With the CrowdScout data, I kept all players from the 2015-16 who had been judged at least 70 times, effectively dropping players who did not spend a significant amount of time on an NHL roster or didn’t receive many implied ratings from a diverse set of users. A dataset containing position, age bucket (same buckets as GAR Linear Model) as of 10/1/2015[6], and CrowdScout score as of 5/25/2016 was constructed for 548 players. A model was then built cross-validating 10 folds from the data, testing each model on unseen, out of sample subsets.


Resampling: Cross-Validated (10 fold)
Summary of sample sizes: 494, 494, 493, 494, 492, 492, ...
Resampling results:


RMSE Rsquared
1.039983 0.7632717

Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.059971 0.99779 -4.069 0.0001***
CrowdScout Score 0.040156 0.002344 17.129 < 2e-16***
age_group21-24 -0.048386 0.387105 -0.125 0.90057
age_group24-28 0.716755 0.378693 1.893 0.05894.
age_group28-31 1.148196 0.386094 2.974 0.00307**
age_group31-35 1.530753 0.389492 3.93 0.000096***
age_groupgt35 2.2832 0.422219 5.408 0.0000000963***
PosD -0.151021 0.123711 -1.221 0.22272
PosG -0.050527 0.171222 -0.295 0.76803
PosW -0.126127 0.122439 -1.03 0.30342
Term 0.474544 0.027242 17.419 < 2e-16***
Projected.Cap.K.Date 0.042666 0.013781 3.096 0.00206**

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The same model methodology using CrowdScout score as a proxy for ability explains about three-quarters of AAV.  Like the GAR model, ‘ability’ has a strong positive relationship with AAV. Pay increases with significant jumps in expected pay from 21-24 to 24-28 and then again as players hit unrestricted free agency at around 28. Goaltenders and wingers are likely expected to have their AAV discounted, all else equal, although the relationship isn’t significant.

Using CrowdScout as a proxy for ability creates a better fitting model compared to using GAR. This is consistent with what we would expect to see since CrowdScout data doesn’t have to worry about players missing games due to injury. This is a study into what we would expect players to be paid – rather than players should be paid – therefore the CrowdScout score is very likely baking in some reputational assessments leading to a stronger relationship with cap-hit. It’s all possible that crowd wisdom is able to determine the impact defensive prowess has on comprehensive ability better than most public data.

Player-Level Residuals:

CS.LM.Players

Team-Level Residuals:

CS.LM.Team

This analysis also measures spending efficiency based on the 2015-16 AAV and ability, because the CrowdScout Score was not available at the start of the season. However, we can create a predicted CrowdScout Score from the 2014-15 season to hold up against 2015-16 AAV, since teams can only act on past performance and project out.

Paid Against the Machine

The original goal of the analysis was to compare player cap-hit to the expected cap-hit. A simple linear model explaining AAV as a function of age, position, term, projected cap at the time of the deal, and CrowdScout score does a good job predicting cap-hit. However, we can also explore additional modeling methods, increasing the depth of interactions between variables (i.e. age and draft year) and strengthen the predictive power. I will make an adjustment to the CrowdScout Score and use a machine learning model which will be able to handle the additional interactions between features:

  • Predicted CrowdScout Score – Outlined here the CrowdScout Score can be reliably predicted using on-ice metrics. I will score each players 2014-15 statistics from puckaltyics.com with the GLM and Random Forest model and take the average of the predicted scores. This will replace the actual CrowdScout Score in the model, which can be biased.
  • Age (as of season start, 10/1/2015) – Move from a strictly bucketed age to a continuous age variable, to help aid the different interactions. This would not work in a linear model, Jagr would mess everything up.
  • Contract Length – Length has proved to be a key explanatory variable. Data courtesy of generalfanager.com.
  • Projected Salary Cap at Contract Date – Also a key explanatory variable. Data courtesy of generalfanager.com.
  • Drafted Boolean – The interaction between whether the player was drafted or not, term, and age should help the model to work out if the player is on an ELC, 2ndcontract, or UFA contract player.

In order to handle interactions between the new variables in the model, a Regression Tree will be used – known as the Random Forest algorithm. A Random Forest is an ensemble model, creating decision trees from randomized variables and subsets of observations, then each ‘tree’ is considered when scoring or predicting an observation. The advantage of this algorithm is that it is extremely powerful. The disadvantage is that it is basically a black box, there are no clean, interpretable parameters to say ‘when all else is equal we expect a player moving from the 31-35 age group to over 35 to be paid about $500k more’ like in a GLM.

A 500 tree model was able to minimize the RMSE under 0.5, with an R2 of close to 0.95.

Despite the lack of coefficients, we can also take a peek under the hood to check how important each variable is in the algorithms decision-making.

AAV.RF.Varimp

The CrowdScout Score and Term variables are the most important variables in the Random Forest model when explaining AAV. That is, when they are used to create a ‘tree’ or decision, they cumulatively reduce the sum of squared residuals more than the other variables. Age, which should work together in tandem with draft history and term, was also important. Projected Cap was had some influence, Draft History even less so.  Team salary and position (consistent with the linear models) were the least important, having no influence in the enhanced model and were dropped.

Note, when the 2014-15 GAR was added to a dataset of non-rookie players and added to the Random Forest model, the importance of GAR was around that of age and did not increase the performance of the model.[7]

The Random Forest model still has trouble predicting very high cap-hits. For example, Patrick Kane and Jonathan Toews and their AAV of $10.5M are considered to be overpaid by over $1M when compared to market value, Toews slightly more with a 78 predicted CrowdScout Score compared to Kane’s 86. With a predicted CrowdScout score of 88, Alex Ovechkin makes $1.2M more than the model would predict. On the flip side, Justin Abdelkader was underpaid by about $2M in the Random Forest model last season. Interestingly, this summer he received a raise of almost the same amount. Patrick Eaves was also underpaid last year by over a million. He was notably underpaid in both GLM models, using Elo and GAR – sporting a healthy predicted CrowdScout score of 58 and 2014-15 GAR of 13.8 he was a 31-year-old winger paid a paltry $1.15M. Other players making about a million less than predicted during the 2015-116 season were Morgan Rielly, Mattias Ekholm, and Kyle Okposo – all of who received healthy raises this summer.

AAV.RF.Player

At a team level, the Islanders, Hurricanes, and Predators led the way in contracting players for less than market value last year. The Islanders received strong value from pending free agents Nielsen and Okposo. The Hurricanes had positive value across the board less Skinner. The Predators are frugal by design, extracting value from their young defense. Note that this analysis fails to include goaltending, where Rinne and Ward would move each team down.

The Avalanche, Flames, and Rangers had the worst value from their contracts. Colorado has very few good contracts when compared to the market. The Flames had a few bad contracts on defense and did not receive an sort of bonus from having top players on ELCs. The Rangers were also pulled down by an overpaid defense.

Also note that the error terms here are small and it wouldn’t take much to move a team up or down the rankings. It also demonstrates that the future is tough to predict and few managers can avoid making salary allocation errors every now and then.

AV.RF.Team

Conclusion

It is critical that NHL franchises effectively manage their salary cap in order to be viable. It appears a model and can explain about 95% of the market for NHL talent. This feels about right, some deals are visibly off from the start, some valuations will change with time, but most of the time teams and agents are in line with what the market would expect as a function the player’s age, draft year, position, term, team salary, and ability. In this study, it appears holding up data from the CrowdScout project to objective on-ice features provided a good proxy for ability.

The Random Forest model is quite strong, with 5% of contracts left unexplained. Some share of this is mis-valuation of the player and market, some of it is inaccuracies of the CrowdScout rating and modeling, some of it might be unexplainable (discount to stay close to family, injury or character concerns, etc.). We are specifically interested in quantifying the first term – how teams might misvalue certain players. With a relatively small error term, it is possible the majority of these residuals are made up of the unquantifiable and the majority of team-level differences is noise. Eye-balling teams in the top 5 and bottom 5 by spending efficiency passed the sniff test, but most managers and agents settle on deals that are in line with the league market.

Finally, it’s important to remember this is a study in what we expect a player’s cap-hit to be given market conditions, rather than what they should make in a free-market NHL. Players on ELCs often provide teams very good value relative to their contract, but in this analysis there is no bonus for production from ELCs since the player age and contract length often signaled when players are likely to be on an ELC. The expected AAV is also calculated with perfect information at the start of the 2015-16 season, where deals have to project out future performance during contract discussions. This alternative analysis might be looked at in the near future, expecting considerably larger error terms – longer timelines introduce more uncertainty.

It’s also important to remember that this analysis leans on ever-maturing data from the CrowdScout project. As expected, it contains enough reputational information to help build a stronger model than using GAR from war-on-ice.com as a proxy for ability. It is possible that this data contains systemic bias – if a higher salary caused the CrowdScout Score to be higher, rather than them simply being correlated. A simple plot (below) suggests that the CrowdScout Score often differs from AAV, which is encouraging. Given that, I hope this unique dataset and model will prove helpful in evaluating contracts and cap management in the future.

Huge thanks to asmean to contributing to this study, specifically advising on machine learning methods.

AAVvScore

______________________________________________________

[1] If a team can consistently acquire and retain talented players who consistently play above their expected contract, they will be operating with a significant advantage. If your 24-year old top 4 defenseman is signed at $4.5M AAV and most comparable players are averaging over $5M AAV, more depth or quality can be acquired elsewhere. If your mid-range starting goalie makes $6M and the goaltending market falls out and sees comparables average less than $5M, you are at a disadvantage. Easy enough.

[2] In absolute terms, that’s a very tough question. The NHL labor market is a long way than the economic-textbook-supply-meets-demand-free-efficient-market. There are salary floors, ceilings, team floors, team ceilings, bonuses, rules regarding age and accrued seasons. Deals are often made with little certainty of future performance (read: teams are poor at forecasting individual player career arcs), and often see a trade-off in salary and duration. An efficient market this is not.

[3] A model is only as good as its target variable, and I believe any comprehensive analysis of ability should attempt to answer that question or one similar to it. Hockey is a goal-scoring contest first and foremost, but the ultimate goal (winning the championship) resembles a marathon of hockey games. This is a tricky distinction since it invites past winners to be overrated, when in alternative histories they did not win, thanks to luck. This is certainly a deeper philosophical question, but an analysis in market value should only care about results.

[4] 2015-2016 GAR has not or will not be posted.

[5] Opposed to simply over-rating a player based due to reputation and other biases. The system is designed to reward those users who have the foresight to forecast declining ability of a player getting by on reputation alone. Some reputational bias will be present until the time a sizeable crowd of excellent forecasters exists.

[6] Presumably when most players were under contract for the 2015-16 season.

[7] varimp

The Path to WAR*

*Wins-Above-Replacement-Like Algorithm-Based Rating

Dream On

The single metric dream has existed in hockey analytics for some time now. The most relevant metric, WAR or Wins Above Replacement, represents an individual player’s contribution to the success of their team by attempting to quantify the number of goals the add over a ‘replacement-level’ player. More widely known in baseball, WAR in hockey is much tougher to delineate, but has been attempted, most notably at the excellent, but now defunct, war-on-ice.com. The pursuit of a single, comprehensive metric has been attempted by Ryder, Awad, Macdonald, Schuckers and Curro, and Gramacy, Taddy, and Jensen.

Their desires and effort are justified: a single metric, when properly used, can be used to analyze salaries, trades, roster composition, draft strategy, etc. Though it should be noted that WAR, or any single number rating, is not a magic elixir since it can fail to pick up important differences in skill sets or role, particularly in hockey. There is also a risk that it is used as a crutch, which may be the case with any metric.

Targeting the Head

Prior explorations into answering the question have been detailed and involved, and rightfully so, aggregating and adjusting an incredible amount of data to create a single player-season value.[1] However, I will attempt to reverse engineer a single metric based on in-season data from a project.

For the 2015-16 season, the CrowdScout project aggregated the opinions of individual users. The platform uses the Elo formula, a memoryless algorithm that constantly adjusts each player’s score with new information. In this case, information is the user’s opinion that is hopefully guided by the relevant on-ice metrics (provided to the user, see below). Hopefully, the validity of this project is closer to Superforecasting than the NHL awards, and it should be: the ‘best’ users or scouts are given increasing more influence over the ratings, while the worst are marginalized.[2]

The CrowdScout platform ran throughout the season with over 100 users making over 32,000 judgments on players, creating a population of player ratings ranging from Sidney Crosby to Tanner Glass. The system has largely worked as intended, but needs to continue to acquire an active, smart, and diverse user base – this will always be the case when trying to harness the ‘wisdom of the crowd.’ Hopefully, as more users sign-up and smarter algorithms emphasize the opinions of the best, the Elo rating will come closer to answering the question posed to scouts as they are prompted to rank two players – if the season started today, which player would you choose if the goal were to win a championship.

stamkosvkopitar
Let’s put our head’s together

Each player’s Elo is adjusted by the range of ratings within the population. The result, ranging from 0 to 100, generally passes the sniff test, at times missing on players due to too few or poor ratings. However, this player-level rating provides something more interesting – a target variable to create an empirical model from. Whereas in theory, WAR is cumulative metric representing incremental wins added by a player, the CrowdScout Score, in theory, represents a player’s value to a team pursuing a championship. Both are desirable outcomes, and will not work perfectly in practice, but this is hockey analytics: we can’t let perfect get in the way of good.

Why is this analysis useful or interesting?

  1. Improve the CrowdScout Score – a predicted CrowdScout Score based on-ice data could help identify misvalued players and reinforce properly valued players. In sum, a proper model would be superior to the rankings sourced from the inaugural season with a small group of scouts.
  2. Validate the CrowdScout Score – Is there a proper relationship between CrowdScout Score and on-ice metrics? How large are the residuals between the predicted score and actual score? Can the CrowdScout Score or predicted score be reliably used in other advanced analyses? A properly constructed model that reveals a solid relationship between crowdsourced ratings and on-ice metrics would help validate the project. Can we go back in time to create a predicted score for past player seasons?
  3. Evaluate Scouts – The ability to reliably predict the CrowdScout Score based on on-ice metrics can be used to measure the accuracy of the scout’s ratings in real-time. The current algorithm can only infer correctness in the future – time needs to pass to determine whether the scout has chosen a player preferred by the rest of the crowd. This could be the most powerful result, constantly increasing the influence of users whose ratings agree with the on-ice results. This is, in turn, would increase the accuracy of the CrowdScout Score, leading a stronger model, continuing a virtuous circle.
  4. Fun – Every sports fan likes a good top 10 list or something you can argue over.

Reverse Engineering the Crowd

We are lucky enough to have a shortcut to a desirable target variable, the end of season CrowdScout Score for each NHL player. We can then merge on over 100 player-level micro stats and rate metrics for the 2015-16 season, courtesy of puckalytics.com. There are 539 skaters that have at least 50 CrowdScout games and complete metrics. This dataset can then be used to fit a model using on-ice data to explain CrowdScout Score, then we use the model output to predict the CrowdScout Score, using the same player-level on-ice data. Where the crowd may have failed to accurately gauge a player’s contribution to winning, the model can use additional information to create a better prediction.

The strength of any model is proper feature selection and prevention of overfitting. Hell, with over 100 variables and over 500 players, you could explain the number of playoff beard follicles with spurious statistical significance. To prevent this, I performed couple operations using the caret package in R.

  1. Find Linear Combination of Variables – using the findLinearCombos function in caret, variables that were mathematically identical to a linear combination of another set of variables were dropped. For example, you don’t need to include goals, assists, and points, since points are simply assists plus goals.
  2. Recursive Feature Elimination – using the rfe function in caret and a 10-fold cross-validation control (10 subsets of data were considered when making the decision, all decision were made on the models performance on unseen, or holdout, data) the remaining 80-some skater variables were considered from most powerful to least powerful. The RFE plot below shows a maximum strength of model at 46 features, but most of the gains are achieve by about the 8 to 11 most important variables.
  3. Correlation Matrix – create a matrix to identify and remove features that are highly correlated with each other. The final model had 11 variables listed below.RFEcorr.matrix

The remaining variables were placed into a Random Forest models targeting the skaters CrowdScout Score. Random Forest is a popular ensemble model[3]: it randomly subsets variables and observations (random) and creates many decision-trees to explain the target variable (forest).  Each observation or player is assigned a predicted score based on the aggregate results of the many decision-trees.

Using the caret package in R,  I created Random Forest model controlled by a 10-fold cross-validation, not necessarily to prevent overfitting which is not a large concern with Random Forest, but to cycle through all data and create predicted scores for each player. I gave the model the flexibility to try 5 different tuning combinations, allowing it to test the ideal number of variables randomly sampled at each split and number of trees to use. The result was a very good fitting model, explaining over 95% of the CrowdScout Score out of sample. Note the variation explained, rather than the variance explained was closer to 70%.

RF.players

Note the slope of the best-fit relationship between actual and predicted scores is a little less than 1. The model doesn’t want to credit the best players too much for their on-ice metrics, or penalize the worst players too much, but otherwise do a very good job.

RF.VarImp

Capped Flexibility

Let’s return to the original intent of the analysis. We can predict about 95% of CrowdScout Score using vetted on-ice metrics. This suggests the score is reliable, but that doesn’t necessarily mean the CrowdScout Score is right. In fact, we can assume that the actual score is often wrong. How does a simpler model do? Using the same on-ice metrics in a Generalized Linear Model (GLM) performs fairly well out of sample, explaining about 70% of the variation. The larger error terms of the GLM model represent larger deviations of the predicted score from the actual. While these larger deviations result in a poorer fitting model fit, they may also contain some truth. The worse fitting linear model has more flexibility to be wrong, perhaps allowing a more accurate prediction.

GLM.players

GLM.VarImp

coefficients
Note the potential interaction between TOI.GM and position

Residual Compare

How do the player-level residuals between the two models compare? They are largely the same directionally, but the GLM residuals are about double in magnitude. So, for example, the Random Forest model predicts Sean Monahan’s CrowdScout Score to be 64 instead of his current 60, giving a residual of +4 (residual = predicted – actual). Not to be outdone, the Generalized Linear Model doubles that residual predicting a 68 score (+8 residual). It appears that both models generally agree, with the GLM being more likely to make a bold correction to the actual score.

Residuals-Compares

Conclusion

The development of an accurate single comprehensive metric to measure player impact will be an iterative process. However, it seems the framework exists to fuse human input and on-ice performance into something that can lend itself to more complex analysis. Our target variable was not perfect, but it provided a solid baseline for this analysis and will be improved. To recap the original intent of the analysis:

  1. Both models generally agree when a player is being overrated or underrated by the crowd, though by different magnitudes. In either case, the predicted score is directionally likely to be more accurate than the current score. This makes sense since we have more information (on-ice data). If it wasn’t obvious, it appears on-ice metrics can help improve the CrowdScout Score.
  2. Fortunate, because our models fail to explain between 5% and 30% of the score and vary more from the true ability. Some of the error will be justified, but often it will signal that the CrowdScout Score needs to adjust. Conversely, a beta project with relatively few users was able to create a comprehensive metric that can be mostly engineered and validated using on-ice metrics.
  3. Being able to calculate a predicted CrowdScout Score more accurate than the actual score gives the platform an enhanced ability to evaluate scouting performance in real-time. This will strengthen the virtuous circle of giving the best scouts more influence over Elo ratings, which will help create a better prediction model.
  4. Your opinion will now be held up against people, models, and your own human biases. Fun.

______________________________________________________

Huge thanks to asmean to contributing to this study, specifically advising on machine learning methods.

[1] The Wins Above Replacement problem is not unlike the attribution problem my Data Science marketing colleagues deal with. We know the was a positive event (a win or conversion) but how do we attribute that event to the input actions between hockey players or marketing channels. It’s definitely a problem I would love to circle back to.

[2] What determines the ‘best’ scout? Activity is one component, but picking players that continue to ascend is another. I actually have plans to make this algorithm ‘smarter’ and is a long overdue explanation on my end.

[3] The CrowdScout platform and ensemble models have similar philosophies – they synthesize the results of models or opinions of users into a single score in order to improve their accuracy.

Goaltending and Hockey Analytics – Linked by a Paradox?

There may be an interesting paradox developing within hockey. The working theory is that as advanced analysis and data-driven decision-making continue to gain traction within professional team operations and management, the effect of what can be measured as repeatable skill may be shrinking. The Paradox of Skill suggests as absolute skill levels rise, results become more dependent on luck than skill. As team analysts continue (begin) to optimize player deployment, development, and management there should theoretically be fewer inefficiencies and asymmetries within the market. In a hypothetical league of more equitable talent distribution, near perfect information and use of optimal strategies, team results would be driven more by luck than superior management.

Goaltenders Raising the Bar

Certainly forecasting anything, let alone still-evolving hockey analytics, is often a fool’s errand – so why discuss? Well, I believe that the paradox of skill has already manifested itself in hockey and actually provides a loose framework of how advanced analysis will become integrated into the professional game. Consider the rise of modern goaltending.

Absolute NHL goaltender ability has continually increased for the last 30 years. However, differential ability between goaltenders has tightened. It has become increasingly difficult to distinguish long-term, sustainable goaltender ability while variations in results are increasingly owed to random chance. Goalies appear ‘voodoo’ when attempting to measure results (read: ability + luck) using the data currently available – much like the paradox of skill would predict.[1] More advanced ways of measuring goaltending performance will be developed (say, controlling for traffic and angular velocity prior to release), but that will just further isolate and highlight the effect of luck.[2]

Spot the Trend Data courtesy of hockey-reference.com
Spot the Trend
Data courtesy of hockey-reference.com

Will well-managed teams create a similar paradox amongst competing professional teams in the future? Maybe. Consider such a team would maximize the expected value talent acquired, employ optimal on-ice strategies, and employ tactics to improve player development. Successful strategies could be reverse engineered and replicated, cascading throughout the league – in theory. Professional sports leagues are ‘copycat’ leagues and there is too much at stake not to adopt a superior strategy, despite a perceived coolness to new and challenging ideas.

Dominant Strategies“I don’t care what you do, just stop the puck”

How did goaltending evolve to dominate the game of hockey? And what parallel pathways need to exist in hockey analytics to do the same?

  1. Advances in technology – equipment became lighter and more protective.[3] This allowed goaltenders to move better, develop superior blocking tactics (standing up vs butterfly), cover more net, and less worry of catching a painful shot. The growth of hockey analytics has been dependent on web scraping, automation, and increasing processing power and will soon come to rely on data derived from motion-tracking cameras. Barriers to entry and cost of resources are negligible lending all fanalysts the opportunity to contribute to the game.
  2. Contributions from independent practitioners – The ubiquitous goaltending coach position is a relatively new one compared to most professional leagues. In the early 2000s, I was lucky enough to cross paths with innovative goaltending instructors who distributed new tactics, strategies, and training methods available to young goaltenders. Between their travel, camps, and clinics (and later their own development centers) they diffused innovative approaches to the position, setting the bar higher and higher for students. A few of these coaches went on become NHL goalie coaches – effectively capturing a position that didn’t exist 30 years prior. Now the existence of goalie coach cascade down to all levels of competitive hockey.[4]  Similarly, the most powerful contributions to the hockey analytics movement have been by bright individuals exposing their ideas and studies to the judicious public. The best ideas were built upon and the rest (generally) discarded. Will hockey analytics evolve (read: become accepted widely among executives) faster than goaltending? I don’t know – a goaltending career takes well over a decade to mature, but they play many games providing feedback on new strategies rather quickly.[5] Comparatively, ideas develop quicker but might take longer to demonstrate their value – not only are humans hard-wired to reject new ideas there are fewer managerial opportunities to prove a heavy data-driven approach to be a dominant strategy.
  3. Existence of a naïve acceptance – The art (and science) of goaltending is not especially well understood among many coaches, particularly with relative skill levels converging. However, managers and coaches do understand results. Early in my career, I had a coach who was only comfortable with stand-up goaltenders, his own formative experiences occurring when goaltender predominately remained erect (in order to keep their poorly padded torso and head from constant danger). However, he saw a dominant strategy (more net coverage) and placed faith in my ability without a comprehensive understanding or comfort of modern goaltending. Analytics will have to be accepted the same way – gradual but built on demonstrated effectiveness. Not everyone is comfortable with statistics and probabilities, but like goaltenders, the job of analysts is to produce results. That means rigorous and actionable work that offers a superior strategy to the status quo. This will earn the buy-in from owners and senior management who understand that they can’t be at a competitive disadvantage.

Forecasting Futility

Clearly the arc of the analytics evolution will differ from the goaltender evolution, primary reasons being:

  • Any sweeping categorization of two-decade-plus ‘movement’ is prone to simplification and revisionist history.
  • While goaltending as a whole has improved substantially, incremental differences in ability still obviously exist between goaltenders. In the same way, not all analysts or teams of analysts will be created equal. A non-zero advantage in managerial ability may compound over time. However, the signal will likely be less significant than variation in luck over that extended timeframe. In both disciplines, that rising ability may give way to a paradox of not being able to decipher their respective skills, muddying the waters around results.
  • Goaltending results occur immediately and visibly. Fair or not, an outlier goaltender can be judged after a quarter of a season, managerial results will take longer to come to fruition. Not only that, we only observe the one of many alternative histories for the manager, while we get to observe thousands of shots against a goaltender. Managerial decisions will almost always operation under a fog of uncertainty.

Alternatively, it important to consider the distribution of athlete talent against those of those in the knowledge economy. Goaltenders are bound by normally distributed deviations of size, speed, and strength. Those limitations don’t exist for engineers and analysts, but they do operate in a more complex system, leaving most decisions to be subjected to randomness. This luck is compounded by the negative feedback loops of the draft and salary cap, it is unlikely a masterfully designed team would permanently dominate, but it suggests some teams will hold an analytical advantage and the league won’t turn into some efficient-market-hypothesis-all-teams-50%-corsi-50%-goals-coin-flip game. But if a superstar analyst team could consistently and handily beat a market of 29 other very good analyst teams in a complex system, they should probably take their skills to another more profitable or impactful industry.

xkcd.com
xkcd.com

Other Paradoxes of Analytics

Because these are confusing times we live in, I’d be remiss if I didn’t mention two other paradoxes of hockey analytics.

    • Thorough, rigorous work is often difficult to understand and not easily understood by senior decision-makers. This is a problem in many data-intensive industries – analytical tools outpace the general understanding of how they work. It seems that (much like the goaltending framework available to us) once data-driven strategies are employed and succeed, all teams will be forced to buy-in and trust that they have hired competent analysts that can deliver actionable insights from a complex question. Hopefully.

  • With more and more teams buying into analytics, the some of the best work is taken private. The best work is taken in-house seemingly overnight, sometimes burying a lot of foundational work and data. That said, these issues are widely understood and there is a noble and concerted effort to maintain transparency and openness. We can only hope that these efforts are appreciated, supported, and replicated.

 

Final Thoughts

The best hockey analysis has borrowed empiricism and data-driven decision-making from the scientific method, creating an expectation that as hockey analytics gain influence at the highest levels, we (collectively) will know more about the game.[7] However, assuming the best hockey analysts end up influencing team behavior, it is possible much of the variation between NHL teams[8] will be random chance – making future predictive discoveries less likely and weakening the relationship of current discoveries.

Additionally, when it feels like the analytical approach to hockey is receiving unjustified push back or skepticism, it is important to remember that the goaltender evolution, initiated by fortuitous circumstance, eventually forced buy-ins from traditionalists by offering a superior approach and results. However, increasing absolute skill in a field can have unintended consequences – relative differences in skill will decrease, possibly causing results to become more dependent on luck than skill. Something to consider next time you try to make sense of the goaltender position.

 

[1] This is not to say all goalies in 2016 are of equal skill levels, but they are absolutely more talented than their ancestors and fall within a smaller range of abilities. That said, outside of a top 2 or 3 guys, the top 5-10 list of goalies is a game of musical chairs, quarter to quarter, season to season.

[2] Goaltenders don’t get a chance to ‘drive the play,’ so it is very important to control for external factors. This can’t be done comprehensively with current data. Even with complete data, it may be futile.

[3] And cooler, possibly attracting better athletes to the position, your author notwithstanding.

[4] Another feature of the paradox of rising skill levels: to fail to improve is the same as getting worse. Hence, employing a goalie coach is necessary in order to prevent a loss of competitiveness. The result: plenty of goalie coaches of varying ability, but likely without a strong effect on their goaltender’s performance. This likely causes some skepticism toward their necessity. This is probably a result of their own success, they are indirectly represented by an individual whose immediate results might owe more to luck than incremental skill aided by the goalie coach.

[5] For example, a strategy devised at 6 years old of lying across the goal line forcing other 6 year-olds to lift the puck proved to be inferior and was consequently dropped from my repertoire.

[7] Maybe even understanding the link between shot attempts and goals (you can read this sarcastically if you like).

[8] And other leagues that are able to track and provide accurate and useful data.

Re-Tooling the Rebuild – An Auction Based Entry Draft System

The Current Entry Draft System

The NHL and NBA annual entry drafts have become strange affairs. The lottery system (and their ever-changing weights) have at times encouraged fans of meddling teams to urge their favorite team to underperform in order to have a higher probability of selecting a top prospect. This is known as tanking or, more euphemistically, re-building or re-tooling. Many rational fans suggest that if you are not going to win a championship, you might as well maximize the chance of adding top young (cost-controlled) talent. Under the current system, they aren’t wrong much of the time.
There are no easy solutions to such a problem because there are two opposing forces at play:
     1)    The goal of the entry draft is to distribute new talent fairly throughout the league. Ideally, the worst teams should have an opportunity to draft the best talent, giving them an opportunity to compete in the future.
     2)    The goal of the league is to maintain a competitive product throughout the season. In a world where the incentive to win is diminished, the league product and brand suffer.

A lottery makes some sense. Teams can lose on purpose, but that still doesn’t guarantee the top pick. Would ‘rebuilding’ teams strip down their roster and be satisfied with a top 5 pick? Probably. Would the same team completely throw a few games to increase the probability of drafting 1st overall by 5%? Probably not. Draft lotteries use randomness to uphold a general competitive balance within the league.

Tanking it to the extreme?

 

However, very few teams are happy with the current system. This is a function of dumb luck and the perceived abuse of the entry draft system. Research suggests the value (the average quality of player historically drafted in that position) of a draft pick decays non-linearly. That is, the difference in value between the 1st overall and 2nd overall is greater than the difference between the 2nd overall and 3rd overall and so on. If you can’t compete, you are better off trying to maximize draft pick value by taking advantage of this non-linear curve – get the highest pick possible.
With very few happy with the current system, alternatives have been suggested and gained some traction (most recently the Gold Plan). However, year to year lotteries with 30 teams will never appear completely random to the human mind, so there inevitably will be annual disappointment with the system from all but one or two fanbases.

The Entry Draft Auction Proposal

The parameters below require more research, but more importantly, must be sold to the owners and teams. An expanded rationalization and methodology can be found later.
  • Each team receives a set amount of draft currency based on their finish during the regular season
  • The worst team would receive 1,000 base draft units (to be coined Bettmans in the NHL). Rank ordered from worst to best each subsequent team would receive 10 fewer units, meaning the champion would receive 710 draft units.
    Each team would have their base draft units adjusted by the z-score of their offensive production multiplied by 10. A team can receive more draft units than the team below them by out-scoring them by a significant amount (approximately 23 goals in the NHL).
  • The team’s maximum bid is set to the number of their base draft unit plus offensive production adjustment draft units. This prevents bottom teams from selling current assets in order to secure enough draft units to guarantee to win the bid for the top pick (this would be a terrible strategy unless there was a generational talent available, but still).
  • Draft units by year could be traded in absolute, share of total, or conditional amounts.
  • On draft day each pick or draft slot is auctioned off in real-time at the draft. The number of draft slots available remains unchanged from the current system.
  • Bids of whole units are blindly and simultaneously submitted. Ties would go to the team with the fewest number of picks to that point, then highest number of picks since prior team pick, else re-auction between tied teams.[1] Losing teams would lose no draft units. The winning team would lose their bid amount, or alternatively the value of the 2nd highest bid.[2]

Rationalizations & Methodology

A further explanation of some of the main ideas behind the Draft Auction.

Base Draft Unit Allocation

The initial allocation of base Draft Units requires research and agreement from all parties. I think there are different ways to do this but here is my framework.
A number of attempts have been made at creating an expected value of a draft pick in NHL. I used research from Michael Schuckers, @DTMAboutHeart and The Leafs Nation’s Chemmy adoption of Avs blogger Jibblescribbits work. Each of these models were indexed against the 1st overall pick (given a value of 1,000), providing an average value of draft value by pick indexed to the 1st overall.
Sources: http://myslu.stlawu.edu/~msch/sports/Schuckers_NHL_Draft.pdf http://theleafsnation.com/2011/3/16/on-relative-worth-of-draft-picks http://donttellmeaboutheart.blogspot.com/2014/11/nhl-draft-pick-value-chart.html
Sources:
http://myslu.stlawu.edu/~msch/sports/Schuckers_NHL_Draft.pdf
http://theleafsnation.com/2011/3/16/on-relative-worth-of-draft-picks
http://donttellmeaboutheart.blogspot.com/2014/11/nhl-draft-pick-value-chart.html
In the NHL the team with the top draft position can be expected to recoup about twice as much talent as the championship team. The shape of the curve also suggests the value by position decays again in a non-linear way. Should the difference in expected value received by the worst team and 2nd worst team be greater than the difference received by the 2nd worst and 3rd worst team? Probably not in a fair system. Also consider that these positions were likely arranged by a lottery.
I would argue that a more equitable system would decay team-level draft value linearly. This can only be accomplished by assigning the granular draft units proposed above. The graphic below re-enforces how this could be accomplished by distributing draft units.
Sources: http://myslu.stlawu.edu/~msch/sports/Schuckers_NHL_Draft.pdf http://theleafsnation.com/2011/3/16/on-relative-worth-of-draft-picks http://donttellmeaboutheart.blogspot.com/2014/11/nhl-draft-pick-value-chart.html
Sources:
http://myslu.stlawu.edu/~msch/sports/Schuckers_NHL_Draft.pdf
http://theleafsnation.com/2011/3/16/on-relative-worth-of-draft-picks
http://donttellmeaboutheart.blogspot.com/2014/11/nhl-draft-pick-value-chart.html
An auction system and agreed upon distribution of draft units would also allow the league to close the gap between the expected value received between the top and bottom teams. The worst team receiving twice the expected draft value seems excessive in the age of salary-cap assisted parity. Expanded research might answer this question in a quantitative manner, but realistically the robustness of the research would take a backseat to a buy-in among the 30 teams. The Auction Entry Draft proposal sells the idea of a more equitable and competitive league (and nobody envisions themselves being the worst team in the league) so it seems like there would be support for closing the gap.
I would suggest the championship team should receive 71% of the draft value of the worst team (see chart above). This is the result of easy math – each team receives 10 fewer base draft units than the team immediately below them in the standings. The formula could be expanded to account for non-playoff team ties, distributing points throughout the league from the worst to best teams in a linear fashion.

Bonus Draft Units for goals scored

Yes, it is kind of video game-y, but there are 2 reasons I think it would be worthwhile:
     1)    Add some noise to the system. If a generation player came along that you would trade your entire draft for, the last place team couldn’t sit on their standing position and out-bid everyone, other bottom feeders could outbid them if they out-scored them by the appropriate margins. Obviously, everyone is trying to score the maximum amount of goals anyway, this just keeps teams honest.
     2)     Incentivization of higher scoring strategies – this is generally good for excitement.
Between 2007 and 2016 (excluding the lockout-shortened 2013-14 season) team’s scored an average of 222 goals per season, with a standard deviation of 23. Full equation:
equation
Below are the distributions of Goals For z-scores in the last 8 season. By definition, 68% of teams would not have their draft units adjusted by more than 10.
Data courtesy of hockey-reference.com
Data courtesy of hockey-reference.com
Alternatively, this calculation could use goal differential. Or there could be no adjustment, again the concern would be a team could theoretically tank and guarantee the right to draft a generational talent with their entire draft stock. Adding an adjustment would prevent this strategy.

Trading

Can’t mess with Trade Deadline Day. Teams can get even more creative since there is no draft picks constrained by round and standing. Want to trade 100 draft units? Great. Trade for 10% of the other team’s base draft units? Cool. 500 draft units if they make the Cup final, 200 otherwise? Sign right here.  

Limited Number of Auction Slots

This is more of a Players Association issue. A cap on the number of auctions keeps the number of drafted players the same or fewer. A case where draft units are not properly rationed and there were no teams left to bid on the last few picks would be a generally good thing.

Real-Time Auction System

This is where I give pause. Entry drafts are high profile events, with lots on the line. The technology component would be critical, any failure would be embarrassing[3] and would require the right safeguards. Every team would have 30 seconds to submit a bid, the winner or a re-auction would be announced immediately, and a mandatory 30 more seconds would pass in order for any team with a technical objection to raise it to officials, then the winner would be on the clock to make their pick.

Information Overload?

Entry drafts have traditionally been monkeys throwing darts at a dartboard, right? So why add another layer of complexity?
Well, economists love auctions because of their ability to imply value, particularly hard-to-calculate values (like the right to draft an unproven, underdeveloped teenager). An auction-based system would be a bonanza of implied information all while being highly entertaining.
It would also further the encourage the operational analysis that has recently grown in sport. Drafts would be fueled by both computer models and high-drama gambles.[4] Data at the draft slot-team level could be made available to teams and public allowing for a unique look into the question – how do teams value draft picks? The trend is clear – advanced analytical methods are becoming the norm in sport, and this system would only accelerate that healthy trend.
Most teams would struggle to neatly quantify the value of draft auction (factors could include, but not limited to, talent currently on the board, total amount of draft units in circulation, current team draft units, historical valuation of draft slot), but it would be a beautiful mess of varying strategies with plenty of unforeseen events. Poorly managed teams would struggle with this configuration but the incentive is clear under an auction system: organizations must commit to competing annually, provide an exciting product, and leverage analytical methods.

Conclusion

The Entry Draft Auction would:
  • Remove the incentive to tank, distributing talent in a more equitable, linear way
  • Incentivize offensive strategies, increasing quality of product
  • Create a unique and highly entertaining experience, producing highly informative data
  • Create more granular and flexible trade blocks, helping facilitate trades and optimal talent distribution around the league

 


[1] Other tiebreakers may be applied.
[2] This distinction would really only matter at the top of the draft, but important.
[3] I can’t think of any league bungling a technological roll out in recent memory. Nope.
[4]  There would be shades of Mike Ditka going all-in on Ricky Williams in 1999.

Welcome to Game Theory

Thanks for visiting the CrowdScout blog – Game Theory!

The CrowdScout platform was designed to automatically and elegantly aggregate the opinions of awesome fanalysts and create unique content – dynamic player rankings that can:
·     aid the decision making of managers (fantasy or professional)
·     settle arguments that happen over cold ones (or not)
·     provide benchmarks for more advanced analysis, i.e. when determining what players are over/undervalued
·     identify scouts with the ability to be ahead of the curve on judging talent
The inaugural beta season was a great learning experience, and I have some exciting plans for season 2, but clearly the website hasn’t hit the critical mass to provide dynamic and self-sustaining content. To supplement the CrowdScout system, I’ll be throwing out some of my own thoughts in my Game Theory blog.

What is Game Theory (or what will it be)?

·     Hopefully delivers both qualitative and quantitative insights to sports (predominantly hockey) – the original idea behind the CrowdScout platform
·     Part thought experiment, part analysis. Some logic and some numbers
·     Whatever seems interesting and easy to write to me. If it is boring to write, I can’t imagine how bad it would to be to read
·     Ideas a little different than the standard – meant to be critiqued. Ideas are stronger with more diverse input – one of the main principles behind CrowdScout
·     Potentially more advanced analysis, possibly combining my own proprietary CrowdScout data with public data
My Background

I’ve been lucky enough to live lives of a colligate hockey goaltender[1], an antitrust economist[2], and a data scientist. I plan on relying on the ensemble of my experiences rather than one – there are more interesting economists and statisticians discussing sports worthy of your time (I believe market forces have spoken on my goaltending abilities as well – unless the NHL was really serious about increasing goals). After finishing my college hockey career, I took some time away from being completely immersed in hockey while the hockey analytics community matured. When I decided I wanted to contribute, I thought it would be best to create something different – a platform that was able to combine analytic and traditional information in a meaningful way. I hope to do the same in the Game Theory blog.


[1] Full disclosure: I played Division III NESCAC hockey (only because there was no Division IV, as my coach liked to remind me)
[2] I also fell ass-backwards into doing anti-trust economic consulting and advising the most recent NHL lockout – a bitter-sweet, but very exciting experience