Goalie Points Above Expected (PAX)

Pictured: Dominik Hasek, who made 70 saves in a 1994 playoff game, beating the New Jersey Devils 1-0 in the 4th overtime. Hasek didn’t receive goal support for the equivalent of 2 full regulation games, but he won anyway. What is the probability of Hasek winning this game and what does it tell us about his contribution to winning?

A Chance to Win

I was lucky enough to attend (and later work at) the summer camps of Ian Clark, who went on to coach Luongo in Vancouver and most recently Bobrovsky in Columbus. Part of the instruction included diving into the mental side of the game. A simple motto that stuck with me was: “just give your team a chance to win.” You couldn’t do it all, and certainly couldn’t do it all at once, it was helpful to focus on the task at hand.

You might give up a bad goal, have a bad period, or two or three, but if you can make the next save to keep things close, a win would absolve all transgressions. Conversely, you might play well, receive no goal support, and lose. Being a goalie leaves little in your control. The goal support a goalie receives is (largely[1]) independent of their ability and outside of rebounds, so are most chances they face[2]. Pucks take improbable bounces (for and against) and 60 minutes is a very short referendum on who deserves to win or lose.

Think of being a hitter in baseball and seeing some mix of fastballs down the middle and absolute junk and the chance to demonstrate marginal ability relative to peers on every 20th pitch.

Smart analysis largely throws away what’s out of the goalies control, focusing on their ability to make saves. This casts wins, whatever they are worth, as only a team stat.

Taking a step back, there’s two problems with this:

  • A central purpose of hockey analytics is to figure out and quantify what drives winning, and removing wins from the equation to focus on save efficiency feels like cruising through your math test and handing it in, only to realize you missed the last page. So close, yet so far.
  • Goalies, coaches, fans, primarily care about winning, so it’s illuminating to create a metric that reflects that. Aligning what’s measured and what matters can be helpful and interesting, at the very least deserves some more advanced exploration.

What Matters

Analysis is at its strongest when we can isolate what is in the goaltender’s control, holding external factors constant the best we can. For example, some teams may give up more dangerous chances than others, so it is beneficial to adjust goaltender save metrics by something resembling aggregate shot quality, such as expected goals. Building on this we can evaluate a goaltender’s ability to win games as a function of the quality of chances they face and the goal support they receive.

To do this we can calculate the expected points based on the number of goals a team scores and the number of chances they give up. Because goalies are partially responsible for rebounds, we can strip out rebounds and replace with a less chaotic, more stable expected rebounds. The result is weighing every initial shot as a probability of a goal and a probability of a rebound, converting expected rebounds to expected goals by using the historical shooting % on rebounds, 27%.

\(Expected Goals Against_{n} =\sum\limits_{i=1}^n P(Goal)_{i} + (0.27\times P(Rebound)_{i})\)

A visual representation of the interaction between these factors supports the expectation – scoring more goals and limiting chances (expected goals) against increases expected points gained. Summed to team-level this information could be used to create a Wins Threshold metric, identifying which goalies need to stand on their heads regularly to win games.

Novel concept

Goalie Points Above Expected Metric (PAX Goaltendana)

The expected points gained based on goal support and chances against will be used to compare to the actual points gained in games started by a goaltender. How does this look in practice? Earlier this season, November 4st, Corey Crawford faced non-rebound shots that totaled 2.4 expected goals against, while Chicago only scored 1 goal in regulation. Simulating this scenario 1,000 times suggests with an average goaltending performance Chicago could expect about 0.5 points (the average of all simulations, see below). However, Crawford pitched a shutout and Chicago won in regulation, earning Chicago 2 points. This suggests this Crawford’s performance was worth about 1.5 points to Chicago, or 1.5 Points Above Expected (PAX).

Recipe for success?

Tracking each of Crawford’s starts (ignoring relief efforts) game-by-game show he’s delivered a few wins against the odds (dark green), while really only costing Chicago one game, against New Jersey (dark red).

Crawford’s War March

The biggest steal of the 2017-18 season so far using this framework? Curtis McElhinney on December 10th faced Edmonton shots worth about 5 expected goals (!) and received 1 goal in support. A team might expect 0.05 points under usual circumstances, but McElhinney pitched a shutout and Toronto got the 2 points.

The Art of the Steal

Other notable performances this season is a mixed bag of big names and backups.

Goalie Date Opponent Expected GA Goal Support Expected Points Actual GA Actual Points PAX
CURTIS MCELHINNEY 12/10/17 EDM 5.07 1 0.06 0 2 1.94
CORY SCHNEIDER 11/1/17 VAN 3.78 1 0.17 0 2 1.83
AARON DELL 11/11/17 VAN 3.18 1 0.27 0 2 1.73
TRISTAN JARRY 11/2/17 CGY 2.93 1 0.33 0 2 1.67
ANTON KHUDOBIN 10/26/17 S.J 4.05 2 0.37 1 2 1.64
CAREY PRICE 11/27/17 CBJ 4.12 2 0.37 1 2 1.64
MICHAL NEUVIRTH 11/2/17 STL 2.66 1 0.38 0 2 1.62
SERGEI BOBROVSKY 12/9/17 ARI 2.72 1 0.39 0 2 1.61
PEKKA RINNE 12/16/17 CGY 3.72 2 0.42 0 2 1.58
ROBERTO LUONGO 11/16/17 S.J 2.49 1 0.42 0 2 1.58

Summing to a season-level reveals which goalies have won more than expected. Goalies above the diagonal line (where points gained = points expected) had delivered positive PAX, goalies below the line had negative PAX.

The Name of the Game

Ground Rules

For simplicity, games that go to overtime will be considered to be gaining 1.5 points for each team, reflecting the less certain nature of the short overtime 3-on-3 and shootout. This removes the higher probability of a goal and quality chances against associated with overtime, which is slightly confounding[3], bringing the focus to regulation time goal support.

This brings up an assumption the analysis originally builds on – that goal support is independent of goaltender performance. We know that score effects suggest a team that is trailing will likely generate more shots and as a result are slightly more likely to score. A bad goal against might create a knock-on effect where the goaltender receives additional goal support. While it is possible that the link between goaltender performance and goal support isn’t completely independent (as we might expect in a complex system like hockey), the effect is likely very marginal. But it this scenario a win would be considered more probable, further discrediting any potential win or loss. Generally, the relationship between goaltender performance and goal support is weak to non-existent.

No gifts under the Christmas tree

However, great puckhandling goalies might directly or indirectly help aid their own goal support by helping the transition out of their zone, keeping their defensemen from extra contact, and other actions largely uncaptured by publicly available data. Piecemeal analysis suggests goalies have little ability to help create offense, but absence of evidence does not equal evidence of absence. This will have to be an assumption the analysis will have to live with[4], any boost to goal support would likely be very small.

Taking the Leap – Icarus?

The goal here is to measure what matters, direct contributions to winning. This framework ties together the accepted notion that the best way from a goaltender to help is team win is to make more saves than expected with the contested idea that some are more likely to make those saves in high leverage situations than others, albeit in an indirect way. To most analysts, being clutch or being a choker are just some random processes with a some narrative applied.

However, once again, absence of evidence does not equal evidence of absence[5]. I imagine advanced biometrics might reveal that some players experience a sharper rise in stress hormones which might effect performance (positively or negatively) during a tie game than if down by a handful of goals. I know I felt it at times, but would have difficulty quantifying its marginal effect on performance, if any. A macro study across all goalies would likely be inconclusive as well. Remember NHL goalies are a sample of the best in the world, those wired weakly very likely didn’t make it (like me).

But winning is important, so it is worth making the jump from puck-stopping ability to game-winning ability. The tradeoff (there’s always tradeoffs) is we lose sample size by a factor of about 30, since the unit of measure is now a game, rather than a shot. This invites less stable results if a game or two have lucky or improbable outcomes. On the other hand, it builds in the possibility some guys are able to raise their level of play based on the situation, rewarding a relatively small number of timely saves, while ignoring goals against when the game was all but decided. I can think of a few games that got out of control where the ‘normal circumstances’ an expected goals model assumes begin to break down.

Low leverage game situation, high leverage franchise situation

Winning DNA?

All hockey followers know goalies can go into brick-wall mode and win games by themselves. The best goalies do it more often, but is it a more distinguishable skill than the raw ability to prevent goals? Remember, we are chasing the enigmatic concept of clutch-ness or ability to win at the expense of sample size, threatening statistically significant measures that give analysis legs.

To test this we can split goalie season into random halves and calculate PAX in each random split, looking at the correlation between each split. For example, goalie A might have 20 of their games with a total PAX of 5 end up in ‘split 1’ and their other 20 games with a PAX of 3 in ‘split 2.’ Doing this for each goalie season we can look at the correlations between the 2 splits.[6]

Using goalie games from 2009 – 2017 we randomly split each goalie season 1,000 times at minimum game cutoffs ranging from 20 to 50,[7] checking the Pearson correlation between each random split. Correlations consistently above 0 suggest the metric has some stability and contains a non-random signal. As a baseline we can compare to the intra-season correlation of a save efficiency metric, goals prevented over expected, which has the advantage of being a shot-level split.

The test reveals that goals prevented per shot carries relatively more signal, which was expected. However, the wins metric also contains stability, losing relative power as sample size drops.

Winning on the Reg

Goalies that contribute points above expected in a random handful of games in any given season are more likely to do the same in their other games. Not only does a wins based metric make sense to the soul, statistical testing suggests it carries some repeatable skill.

Final Buzzer

Goalie wins as an absolute number are a fairly weak measure of talent, but they do contain valuable information. Like most analyses, if we can provide the necessary context (goal support and chances against) and apply fair statistical testing, we can begin to learn more about what drives wins. While the measure isn’t vastly superior to save efficiency, it does contain some decent signal.

Exploring goaltender win contributions with more advanced methods is important. Wins are the bottom line, they drive franchise decisions, and frame the narrative around teams and athletes. Smart deep dives may be able to identify cases which poor win-loss records are bad luck and which have more serious underlying causes.

A quick look at season-level total goals prevented and PAX (the metrics we compared above) show an additional goal prevented is worth about 0.37 points in the standings, which is supported by the 3-1-1 rule of thumb, or more precisely,  2.73 goals per point calculated in Vollman’s Hockey Abstract. Goal prevention explains about 0.69 of the variance in PAX, so the other 0.31 of the variance may include randomness and (in)ability to win. Saves are still the best way to deliver wins, but there’s more to the story.

Save saves for when they matter?


When I was a goalie, it was helpful to constantly reaffirm my job: give my team a chance to win. I couldn’t score goals, I couldn’t force teams to take shots favorable to me, so removing that big W from the equation helped me focus on what I could control: maximizing the probability of winning regardless of the circumstances.

This is what matters to goalies, their contribution to wins. Saves are great, but a lot of them could be made by a floating chest protector. While the current iteration of the ‘Goalie Points Above Expected’ metric isn’t perfect, hopefully it is enlightening. Goalies flip game probabilities on their head all the time, creating a metric to capture that information is an important step in figuring out what drives those wins.

Thanks for reading! I hope to make data publicly available and/or host an app for reference.  Any custom requests ping me at @crowdscoutsprts or cole92anderson@gmail.com.

Code for this analysis was built off a scraper built by @36Hobbit which can be found at github.com/HarryShomer/Hockey-Scraper.

I also implement shot location adjustment outlined by Schuckers and Curro and adapted by @OilersNerdAlert. Any implementation issues are my fault.

My code for this and other analyses can be found on my Github, including the feature generation and modeling of current xG and xRebound models and PAX calculations.


[1] I personally averaged 1 point/season, so this assumption doesn’t always hold.

[2] Adequately screaming at defensemen to cover the slot or third forwards to stay high in the offensive zone is also assumed.

[3] If a goalie makes a huge save late in a tie game and subsequently win in overtime, the overtime goal was conditional on the play of the goalie, making the win (with an extra goal in support) look easier than it would have otherwise.

[4] Despite it partially delegitimizing my offensive production in college.

[5] Hockey analysts can look to baseball for how advanced analysis aided by more granular data can begin to lend credence to concepts that had been dismissed as an intangible or randomness explained by a narrative.

[6] Note that the split of PAX is at the game-level, which makes it kind of clunky.  Splitting randomly will mean some splits will have more or less games, possibly making it tougher to find a significant correlation. This isn’t really a concern with thousands of shots.

[7]The ugly truth is that an analyst with a point to prove could easily show a strong result for their metric by finding a friendly combination random split and minimum games threshold. So let’s test and report all combinations.

Expected Goals (xG), Uncertainty, and Bayesian Goalies

All xG model code can be found on GitHub.

Expected Goals (xG) Recipe

If you’re reading this, you’re likely familiar with the idea behind expected goals (xG), whether from soccer analytics, early work done by Alan RyderBrian MacDonald, or current models by DTMAboutHeart and Asmean, Corsica, Moneypuck, or things I’ve put up on Twitter. Each model attempts to create a probability of each shot being a goal (xG) given the shot’s attributes like shot location, strength, shot type, preceding events, shooter skill, etc. There are also private companies supplementing these features with additional data (most importantly pre-shot puck movement on non-rebound shots and some sort of traffic/sight-line metric) but this is not public or generated in the real-time so will not be discussed here.[1]

To assign a probability (between 0% and 100%) to each shot, most xG models likely use logistic regression – a workhorse in many industry response models. As you can imagine the critical aspect of an xG model, and any model, becomes feature generation – the practice of turning raw, unstructured data into useful explanatory variables. NHL play-by-play data requires plenty of preparation to properly train an xG model. I have made the following adjustments to date:

  • Adjust for recorded shot distance bias in each rink. This is done by using a cumulative density function for shots taken in games where the team is away and apply that density function to the home rink in case their home scorer is biased. For example (with totally made up numbers), when Boston is on the road their games see 10% of shots within 5 feet of the goal, 20% of shots within 10 feet of the goal, etc. We can adjust the shot distance in their home rink to be the same since the biases of 29 data-recorders should be less than a single Boston data-recorder. If at home in Boston, 10% of the shots were within 10 feet of the goal, we might suspect that the scorer in Boston is systematically recording shots further away from the net than other rinks. We assume games with that team result in similar event coordinates both home and away and we can transform the home distribution to match the away distribution. Below demonstrates how distributions can differ between home and away games, highlighting the probable bias Boston and NY Rangers scorer that season and was adjusted for. Note we also don’t necessarily want to transform by an average, since the bias is not necessarily uniform throughout the spectrum of shot distances.
home rink bias
No Place Like Home
  • Figure out what events lead up to the shot, what zone they took place in, and the time lapsed between these events and the eventual shot while ensuring stoppages in play are caught.
  • Limit to just shots on goal. Misses include information, but like shot distance contain scorer bias. Some scorers are more likely to record a missed shot than others. Unlike shots where we have a recorded event, and it’s just biased, adjusting for misses would require ‘inventing’ occurrences in order to adjust biases in certain rinks, which seems dangerous. It’s best to ignore misses for now, particularly because the majority of my analysis focuses on goalies. Splitting the difference between misses caused by the goalie (perhaps through excellent positioning and reputation for not giving up pucks through the body) and those caused by recorder bias seems like a very difficult task. Shots on goal test the goalie directly hence will be the focus for now.
  • Clean goalie and player names. Annoying but necessary – both James and Jimmy Howard make appearances in the data, and they are the same guy.
  • Determine the strength of each team (powerplay for or against or if the goaltender is pulled for an extra attacker). There is a tradeoff here. The coefficients for the interaction of states (i.e. 5v4, 6v5, 4v3 model separately) pick up interesting interactions, but should significant instability from season to season. For example, 3v3 went from a penalty-box filled improbability to a common occurrence to finish overtime games. Alternatively, shooter strength and goalie strength can be model separately, this is more stable but less interesting.
  • Determine the goaltender and shooter handedness and position from look-up tables.
  • Determine which end of the ice and what coordinates (positive or negative) the home team is based, using recordings in any given period and rink-adjusting coordinates accordingly.
  • Calculate shot distance and shot angle. Determine what side of the ice the shot is from, whether or not it is the shooters off-wing based on handedness.
  • Tag shots as rushes or rebound, and if a rebound how far the puck travelled and the angular velocity of the puck from shot 1 to shot 2.
  • Calculate ‘shooting talent’ – a regressed version of shooting percentage using the Kuder-Richardson Formula 21, employed the same way as in DTMAboutHeart and Asmean‘s xG model.

All of this is to say there is a lot going on under the hood, the results are reliant on the data being recorded, processed, adjusted, and calculated properly. Importantly, the cleaning and adjustments to the data will never be complete, only issues that haven’t been discovered or adjusted for yet. There is no perfect xG model, nor is it possible to create one from the publicly available data, so it is important to concede that there will be some errors, but the goal is to prevent systemic errors that might bias the model. But these models do add useful information regular shot attempt models cannot, creating results that are more robust and useful as we will see.

Current xG Model

The current xG model does not use all developed features. Some didn’t contain enough unique information, perhaps over-shadowed by other explanatory variables. Some might have been generated on sparse or inconsistent data. Hopefully, current features can be improved or new features created.

While the xG model will continue to be optimized to better maximize out of sample performance, the discussion below captures a snapshot of the model. All cleanly recorded shots from 2007 to present are included, randomly split into 10 folds. Each of the 10 folds were then used a testing dataset (checking to see if the model correctly predicted a goal or not by comparing it to actual goals) while the other 9 corresponding folders were used to train the model. In this way, all reported performance metrics consist of comparing model predictions on the unseen data in the testing dataset to what actually happened. This is known as k-fold cross-validation and is fairly common practice in data science.

When we rank-order the predicted xG from highest to lowest probability we can compare the share of goals that occur to shots ordered randomly. This gives us a gains chart, a graphic representation of the how well the model is at finding actual goals relative to selecting shots randomly. We can also calculate the Area Under the Curve (AUC), where 1 is a perfect model and 0.5 is a random model. Think of the random model in this case as shot attempt measurement, treating all shots as equally likely to be a goal. The xG model has an AUC of about 0.75, which is good, and safely in between perfect and random. The most dangerous 25% of shots as selected by the model make up about 60% of actual goals. While there’s irreducible error and model limitations, in practice it is an improvement over unweighted shot attempts and accumulates meaningful sample size quicker than goals for and against.

gains chart
Gains, better than random

Hockey is also a zero-sum game. Goals (and expected goals) only matter relative to league average. Original iterations of the expected goal model built on a decade of data show that goals were becoming dearer compared to what was expected. Perhaps goaltenders were getting better, or league data-scorers were recording events to make things look harder than they were, or defensive structures were impacting the latent factors in the model or some combination of these explanations.

Without the means to properly separate these effects, each season receives it own weights for each factor. John McCool had originally discussed season-to-season instability of xG coefficients. Certainly this model contains some coefficient instability, particularly in the shot type variables. But overall these magnitudes adjust to equate each seasons xG to actual goals. Predicting a 2017-18 goal would require additional analysis and smartly weighting past models.

Coefficient Stability
Less volatile than goalies?

xG in Action

Every shot has a chance of going in, ranging from next to zero to close to certainty.  Each shot in the sample is there because the shooter believed there was some sort of benefit to shooting, rather than passing or dumping the puck, so we don’t see a bunch of shots from the far end of the rink, for example. xG then assigns a probability to each shot of being a goal, based on the explanatory variables generated from the NHL data – shot distance, shot angle, is the shot a rebound?, listed above.

Modeling each season separately, total season xG will be very close to actual goals. This also grades goaltenders on a curve against other goaltenders each season. If you are stopping 92% of shots, but others are stopping 93% of shots (assuming the same quality of shots) then you are on average costing your team a goal every 100 shots. This works out to about 7 points in the standings assuming a 2100 shot season workload and that an extra 3 goals against will cost a team 1 point in the standings. Using xG to measure goaltending performance makes sense because it puts each goalie on equal footing as far as what is expected, based on the information that is available.

We can normalize the number of goals prevented by the number of shots against to create a metric, Quality Rules Everything Around Me (QREAM), Expected Goals – Actual Goals per 100 Shots. Splitting each goalie season into random halves allows us to look at the correlation between the two halves. A metric that captures 100% skill would have a correlation of 1. If a goaltender prevented 1 goal every 100 shots, we would expect to see that hold up in each random split. A completely useless metric would have an intra-season correlation of 0, picking numbers out of a hat would re-create that result. With that frame of reference, intra-season correlations for QREAM are about 0.4 compared to about 0.3 for raw save percentage. Pucks bounce so we would never expect to see a correlation of 1, so this lift is considered to be useful and significant.[2]

intra-season correlations
Goalies doing the splits

Crudely, each goal prevented is worth about 1/3 of a point in the standings. Implying how many goals a goalie prevents compared to average allows us to compute how many points a goalie might create for or cost their team. However, a more sophisticated analysis might compare goal support the goalie receives to the expected goals faced (a bucketed version of that analysis can be found here). Using a win probability model the impact the goalie had on win or losing can be framed as actual wins versus expected.


xG’s also are important because they begin to frame the uncertainty that goes along with goals, chance, and performance. What does the probability of a goal represent? Think of an expected goal as a coin weighted to represent the chance that shot is a goal. Historically, a shot from the blueline might end up a goal only 5% of the time. After 100 shots (or coin flips) will there be exactly 5 goals? Maybe, but maybe not. Same with a rebound from in tight to the net that has a probability of a goal equal to 50%. After 10 shots, we might not see 5 goals scored, like ‘expected.’ 5 goals is the most likely outcome, but anywhere from 0 to 10 is possible on only 10 shots (or coin flips).

We can see how actual goals and expected goals might deviate in small sample sizes, from game to game and even season to season. Luckily, we can use programs like R, Python, or Excel to simulate coin flips or expected goals. A goalie might face 1,000 shots in a season, giving up 90 goals. With historical data, each of those shots can be assigned a probability of a being a goal. If the average probability of a goal is 10%, we expect the goalie to give up 100 goals. But using xG, there are other possible outcomes. Simulating 1 season based on expected goals might result in 105 goals against. Another simulation might be 88 goals against. We can simulate these same shots 1,000 or 10,000 times to get a distribution of outcomes based on expected goals and compare it to the actual goals.

In our example, the goalie possibly prevented 10 goals on 1,000 shots (100 xGA – 90 actual GA). But they also may have prevented 20 or prevented 0. With expected goals and simulations, we can begin to visualize this uncertainty. As the sample size increases, the uncertainty decreases but never evaporates. Goaltending is a simple position, but the range of outcomes, particularly in small samples, can vary due to random chance regardless of performance. Results can vary due to performance (of the goalie, teammates, or opposition) as well, and since we only have one season that actually exists, separating the two is painful. Embracing the variance is helpful and expected goals help create that framework.

It is important to acknowledge that results do not necessarily reflect talent or future or past results. So it is important to incorporate uncertainty into how we think about measuring performance. Expected goal models and simulations can help.

simulated seasons
Hackey statistics

Bayesian Analysis

Luckily, Bayesian analysis can also deal with weighting uncertainty and evidence. First, we set a prior –probability distribution of expected outcomes. Brian MacDonald used mean Even Strength Save Percentage as prior, the distribution of ESSV% of NHL goalies. We can do the same thing with Expected Save Percentage (shots – xG / shots), create a unique prior distribution of outcome for each goalie season depending on the quality of shots faced and the sample size we’ll like to see. Once the prior is set, evidence (saves in our case) is layered on to the prior creating a posterior outcome.

Imagine a goalie facing 100 shots to start their career and, remarkably, making 100 saves. They face 8 total xG against, so we can set the Prior Expected Save% as a distribution centered around 92%. The current evidence at this point is 100 saves on 100 shots, and Bayesian Analysis will combine this information to create a Posterior distribution.

Goaltending is a binary job (save/goal) so we can use a beta distribution to create a distribution of the goaltenders expected (prior) and actual (evidence) save percentage between 0 and 1, like a baseball players batting average will fall between 0 and 1. We also have to set the strength of the prior – how robust the prior is to the new evidence coming in (the shots and saves of the goalie in question). A weak prior would concede to evidence quickly, a hot streak to start a season or career may lead the model to think this goalie may be a Hart candidate or future Hall-of-Famer! A strong prior would assume every goalie is average and require prolonged over or under achieving to convince the model otherwise. Possibly fair, but not revealing any useful information until it has been common knowledge for a while.

bayesian goalie
Priors plus Evidence

More research is required, but I have set the default prior strength of equivalent to 1,000 shots. Teams give up about 2,500 shots a season, so a 1A/1B type goalie would exceed this threshold in most seasons. In my goalie compare app, the prior can be adjusted up or down as a matter of taste or curiosity. Research topics would investigate what prior shot count minimizes season to season performance variability.

Every time a reported result actives your small sample size spidey senses, remember Bayesian analysis is thoroughly unimpressed, dutifully collecting evidence, once shot at a time.


Perfect is often the enemy of the good. Expected goal models fail to completely capture the complex networks and inputs that create goals, but they do improve on current results-based metrics such as shot attempts by a considerable amount.  Their outputs can be conceptualized by fans and players alike, everybody understands a breakaway has a better chance of being a goal than a point shot.

The math behind the model is less accessible, but people, particularly the young, are becoming more comfortable with prediction algorithms in their daily life, from Spotify generating playlists to Amazon recommender systems. Coaches, players, and fans on some level understand not all grade A chances will result in a goal. So while out-chancing the other team in the short term is no guarantee of victory, doing it over the long term is a recipe for success. Removing some the noise that goals contain and the conceptual flaws of raw shot attempts helps the smooth short-term disconnect between performance and results.

My current case study using expected goals is to measure goaltending performance since it’s the simplest position – we don’t need to try to split credit between linemates. Looking at xGA – GA per shot captures more goalie specific skill than save percentage and lends itself to outlining the uncertainty those results contain. Expected goals also allow us to create an informed prior that can be used in a Bayesian hierarchical model. This can quantify the interaction between evidence, sample size, and uncertainty.

Further research topics include predicting goalie season performance using expected goals and posterior predictive distributions.


[1]Without private data or comprehensive tracking data technology analysts are only able to observe outcomes of plays – most importantly goals and shots – but not really what created those results. A great analogy came from football (soccer) analyst Marek Kwiatkowski:

Almost the entire conceptual arsenal that we use today to describe and study football consists of on-the-ball event types, that is to say it maps directly to raw data. We speak of “tackles” and “aerial duels” and “big chances” without pausing to consider whether they are the appropriate unit of analysis. I believe that they are not. That is not to say that the events are not real; but they are merely side effects of a complex and fluid process that is football, and in isolation carry little information about its true nature. To focus on them then is to watch the train passing by looking at the sparks it sets off on the rails.

Armed with only ‘outcome data’ rather than comprehensive ‘inputs data’ analyst most models will be best served with a logistic regression. Logistic regression often bests complex models, often generalizing better than machine learning procedures. However, it will become important to lean on machine learning models as reliable ‘input’ data becomes available in order to capture the deep networks of effects that lead to goal creation and prevention. Right now we only capture snapshots, thus logistic regression should perform fine in most cases.

[2] Most people readily acknowledge some share of results in hockey are luck. Is the number closer to 60% (given the repeatable skill in my model is about 40%), or can it be reduced to 0% because my model is quite weak? The current model can be improved with more diligent feature generation and adding key features like pre-shot puck movement and some sort of traffic metric. This is interesting because traditionally logistic regression models see diminishing marginal returns from adding more variables, so while I am missing 2 big factors in predicting goals, the intra-seasonal correlation might only go from 40% to 50%. However, deep learning networks that can capture deeper interactions between variables might see an overweight benefit from these additional ‘input’ variables (possibly capturing deeper networks of effects), pushing the correlation and skill capture much higher. I have not attempted to predict goals using deep learning methods to date.