Why Project Performance?

The first instinct upon seeing goaltending projections should be: why? Goaltending is a notoriously volatile position making little sense to those even with a deep understanding of the game. In any given season, both high-pedigree and also-rans goaltenders are seemingly as likely to deliver top performances. A perceived star having a poor season can sink a promising season.

But that magnitude of impact makes it an interesting and useful exercise. Goaltending, though volatile, exerts an outsized influence on games and seasons, for better or worse. If your goaltender is on, the game is easy, and if they are off, everyone invested in the team is just waiting for something to go wrong.

Importantly, volatility is something statistics can capture and quantify along with the potential impact on the team. In a league where true skill from team to team can be tight, that impact is relatively large. Last regular season, goalies made up 11 of the top 30 WAR (Wins Above Replacement) contributors, according to corsica.hockey.

And that’s the crux: a volatile but important position is still important. It is often useful to use data to project future results, no matter how difficult and frustrating the process can be.

It’s important to preface that this analysis deals with goaltender statistical profile rather than true goaltender ability. In a perfect world, we could successfully derive a metric that aligned the two in a meaningful way, and current methods do their best to isolate goaltender performance by adjusting for the quality of shots their team allows. However, there are latent variables characteristic of certain teams. For example, some teams may allow a higher-rate of screened shots or cross-ice passes relative to the recorded shot attributes might suggest. I’ve estimated that team-level latent effects on shot quality can be 0.2% at even strength and about 0.6% on the powerplay.
Therefore, all projections suggest which goalies will likely return the best results rather than which goalies are definitively better. Results are influenced by ability, health, age, opportunity, coaching, and team-effects, all contributing to the difficulty of prediction.

What Result Do We Care About?

In order to create a projection, we first must decide what to measure over the upcoming season. Some fantasy websites might project games played or wins, or standard save percentage. However, we want a metric that best isolates goaltender performance, given the available data. Publically, this is currently best done by adjusting save percentage by expected save percentage using an expected goal (xG) model. Each shot is weighted by the probability of it being a goal given what we know about the shot and measured against actual goals against.
Metric stability

Rebound Control

However, rebound shots weigh heavily in expected save percentage calculations, and rightfully so. Rebound shots are about 4 times more dangerous than initial shots (a shooting percentage of 26% and 6%, respectively). However, rebounds are not necessarily independent of the goaltender, in theory to goaltender has some control over rebounds opportunities against them.
Determining that rebound-prevention as a repeatable ‘skill’ is tricky – unlike a goal against, there is no solid definition of a rebound. The shooter, goalie, rebounder, defender, and record keeper (rebounds are designated when a follow-up shot is within 2 seconds of the previous) all have some impact on the outcome. However, removing credit for rebound chances against and replacing with an expected goal value derived from an expected rebound calculation multiplied by the probability of a goal (about 25% of rebound shots end up as a goal) helps remove some of the noise rebound xG creates leading to more stable predictions.

Non-Naive Bayes

Results are often complicated by sample size. More shots mean more information. While we could only include goalie-seasons with X number of shots, these cut-offs can be arbitrary and can be fiddled with to create spurious results (1000 shot minimum looks like this, but 1200 looks like this). My approach is the add a regressor to observed results to bring the goaltender back a single number prediction headed into the season based a simple linear model. The model inputs are last seasons results, shots against, partner performance, age, and whether it was a rookie season. This prediction acts as the Prior (prior probability distribution, red line below), our best guess of how the goaltender will perform that particular season before allowing evidence (results) to pile up.
If a 25-year old rookie is brought up from the AHL, we will probably expect below average results, say an extra goal against every 100 shots (-1%). If they post a 30-save shutout in their first game, the evidence (the shutout) wouldn’t necessarily overwhelm the prior, so combining our prior beliefs and evidence (realized save percentage) into a posterior (posterior probability distribution, blue line below), our updated estimate of their results will better than the prior of -1%, but not by much. However, after 10 games of superb results will begin to move into positive territory.
Piling on the evidence

How quickly does the evidence overwhelm the prior? That depends on the prior strength. We can imagine the prior as a synthetic goalie put in net for a set number of shots recording the same results as the prior expectation of them. So if we have a strong prior, we might ‘simulate’ close to a season of data before considering actual results. A weak prior might only be a hundred shots. The weaker the prior, the quicker the actual results and posterior results converge, as seen above.

What prior strength best stabilizes results over the season in order to best use in prediction? We will test that out later.

Target Data

The metric I’m choosing to measure is:
1. Save % Lift Over Expected – consider actual save % relative to the expected save % derived from an expected goal model (which considers shot location, shot type, strength, shooter, and time and location over the event prior to the shot.
2. Regressed – using a Bayesian approach we will test various prior strengths in order to create a metric with a good balance between efficiency and workload.
3. Rebound Adjusted – Removing some of the noise that rebounds can add when using expected goal models to measure shot quality faced by a goalie.
This metric satisfies both philosophically and statistically. Philosophically, we are measuring goaltender performance based on what they do with each initial (non-rebound) shot against, based on the features we know about it and index their results to league average.
Statistically, when trying to predict future results, this metric performs better than raw save %, save % over expected unadjusted for rebounds, and unregressed save % over expected adjusted for rebound. Though this isn’t always a high bar to clear.
Prior work from RITHAC 2017

The Marcel Framework

The easiest way to forecast the future is to look at the past. But how far into the past? Is yesterday more relevant than the day before it, and by how much?
A standard method to forecast athlete performance uses the marcel framework, which has its roots in baseball and has been adapted for hockey numerous times. Results from prior seasons are aggregated and given less weight the further in the past they are.
A two-season marcel projecting 2018-19 results might weight 2017-18 results by 75% and 2016-17 results by 25%, totalling 100%. If we wanted a feature to represent goaltender shots faced, and in 2017-18 they faced 2,000 and in 2016-17 they faced 1,000 shots, using the 75-25 weights, our representation of shots faced would be 1,750 ((2000 * 0.75) + (1000 * 0.25)).
Like our prior strength parameter, to best parameter capture history (look back seasons) and recency (how to weight each season) can be tested.

Building the Grid

The goal of the analysis is to best predict future performance, and we have a few parameters we want to test to best generate model inputs and targets – prior strength, marcel lookback seasons, and relative weighting of lookback seasons. For each parameter, we can test various values (i.e. 100, 400… 3000 prior shots, 1…5 lookback seasons, 10 different weighting configurations) and then test model performance for each of the unique 350 combinations of parameters.

Under the Hood

Each parameter combination is used to create the:
1. Target variable – regressed, rebound adjusted save % over expected
2. Input features
1. Marcel-weighted regressed, rebound adjusted save % over expected
2. Marcel-weighted shots against
3. Marcel-weighted even-strength rebound adjusted save % over expected
4. Marcel-weighted rebound adjusted save % over expected of partner goaltenders
5. Age

For each, test season we calculate the target variable and aggregate the input metrics from prior seasons. We can then train a few different models exploring the relationship between marcel-weighted prior metrics and unseen future results.

Each model splits out 80% of the 576 goalie-seasons from 2010-11 to train a model. The caret package is used to create a cross-validated model by splitting the data into 5 folds, repeating the process 5 times, in order to the find the optimal tuning parameters. The remaining 20% of the data is held out and the model performance is measured on that unseen data. Four models are fit.
1. Random Forest Model (4 inputs) – input features of regressed results, shots against, prior even-strength results, and age. This decision tree looks for splits in the data that might be useful in predicting future performance.
2. Linear Model (3 inputs)  – input features of regressed results, shots against, and prior even-strength results. Simple model solely based on prior results.
3. Linear Model (4 inputs)  – input features of regressed results, shots against, prior even-strength results, and age. The model hopes to balance performance with age.
4. Linear Model (5 inputs)  – input features of regressed results, shots against, prior even-strength results, age, and performance of partner goalies.
Each model is then applied to the about 60 goaltenders with NHL experience likely to be on opening day rosters. For each of the 4 model predictions, we have 350 different parameter calculations, considering only models with good out-of-sample testing scores. Those out-of-sample scores are then used to take a weighted average of the prediction along with each of their confidence intervals. Finally, the 4 model prediction and confidence intervals are averaged together to represent reasonable forecast for the upcoming season.

Results

Each goaltender has a forecast presented with a range of results, given their statistical profile and the modelling process. A lower peak and wider plot distribution represent a more uncertain prediction. It appears that age and prior inconsistency generally increase the uncertainty, which makes intuitive sense. However, due to the nature of the modelling process, the exact relationship is a bit obfuscated.

It’s also important to note that this metric represents both efficiency (per shot) and workload. Goaltenders that have demonstrated the ability to handle a heavy schedule, like Frederik Andersen, are given more credit since their above average results will likely be across more shots (overcoming the regressor). Taking extra starts from a back-up or replacement-level goaltender will likely benefit the team.

There’s obviously a lot of overlap between many goalies, which might make it unclear how exactly a decision-maker might glean information from the analysis. It might more helpful to simulate seasons by ‘drawing’ results from the calculated distribution and comparing results to peers like we would in the card game ‘War.’ If we sample from the distributions of Braden Holtby and Peter Budaj 1000 times, Budaj would post superior results about 3% of the time.

This exercise can be done for each team with veteran goalies in their system against 2 veteran free-agent goalies, Kari Lehtonen and Steve Mason. While goalies like Greiss and Darling are projected to only outplay Steve Mason in about 20% of simulated seasons, this apparent gamble could also factor in things like contract status, age, or injury risk. In any event, we can capture the uncertainty and provide the opportunity to make a calculated decision.

Calculated risks

Bottom Line

An alternative calculation is to simulate absolute goals prevented over expected for each team. Based on rostered goaltenders forecasted outcomes we can create a distribution of possible outcomes by simulating their season thousands of times. As a point of reference, last season that range was about +/- 40 goals, representing about a 15 point swing in the standings. There are no certain outcomes, but you can maximize the probability of ending up in positive territory.

Simulated Seasons[/caption]

Conclusion

Every season brings its own hard lessons on how difficult it can be to predict goaltender performance. Therefore it makes sense any forecast shouldn’t avoid uncertainty, but rather try to embrace it.

Teams and decision-makers are best aided by understanding that future performance is only probabilistic. Carey Price might be one of the most talented goaltenders in the league, but how likely was his poor performance last season? Unlikely, but certainly not zero. That’s true of every goalie heading into the 2018-19 season.

The universe of goaltenders are more talented than ever, so it’s no surprise that the top talents in the world when indexed to each other, are not separated by much. The means as the upcoming season unfolds, the results we observe will quickly deviate from what is expected in many cases. In some of those, they will reconverge, but others might see that opportunity lost to injury or an opportunistic teammate.

But it is important to know what to expect from goaltenders. Evaluators might have an easier time forecasting bottom-6 skater performance, but the impact on the outcome of the season is considerably less.

Teams only get a few chips a season on goaltenders, the edge might be small but the payoffs compound over the course of the season and often season-defining. A statistical forecasting approach that incorporates uncertainty can help them quantify that bet.

Thanks for reading! Any custom requests ping me at @crowdscoutsprts or cole92anderson@gmail.com. Code for this and other analyses can be found on my Github.

Clutch Off the Bench

The 2018 1st round series featuring the Columbus Blue Jackets and Washington Capitals was an interesting case study in playoff goaltending performance.

Starring for Columbus was Sergei Bobrovsky. The reigning Vezina Trophy winner was coming off another very good season, hoping to continue rolling in the playoffs. However, despite Bobrovsky’s accolades he has never advanced past the 1st round and has been uncharacteristically below average preventing goals in all of his prior playoff appearances.

In another subplot, Washington actually began the series with Philipp Grubauer. He had been excellent in the regular season but relatively untested in the playoffs (he played parts of two clean-up games, neither went great). This decision put Braden Holtby on the bench, who had a very pedestrian regular season but had been at or above average in all 5 of his previous playoff appearances. Cumulatively playoff Holtby has prevented about 1.1% more goals than expected, 2nd only to Jonathan Quick of goalies entering the playoffs with at least 1,000 playoff shots to their names.

Everyone knows what happened next. Grubauer wasn’t great, while Washington dropped their 1st two games at home. Holtby came in and delivered 4 straight above average performances, while Bobrovsky ended the series with 3 straight below average games. Washington took the series 4-2.

A few interesting questions can come out of this series I hope to explore. Was Washington coach Barry Trotz right to go with the ‘hot-hand’ over the ‘proven-vet’ by starting Grubauer? Is it likely that a goalie might be good in the regular season, but below average in the playoffs? More generally, if we are trying to explain goaltending performance in the playoffs, what matters more? Past playoff performances, regular season results, or just career results?

Can someone simply turn it on in the playoffs after a below-average regular season?

High Stakes Noise

Let me preface most of this exploration with the understanding the idea of ‘clutch’ or ‘performance-when-it-matters’ is problematic from a statistical perspective. A few bounces over a playoff series might dictate whether the outcome is perceived as ‘clutch’ or ‘choking.’ In reality, a good game or bad game doesn’t have much effect on the outcome of the next game, but if you flip a few heads in a row (bad games) you are out of the playoffs, while a few tails mean you advance. Someone has to advance, so a ‘clutch’ narrative might be created from chance outcomes alone.

With a small sample like a playoff series, a bounce or two can change the narrative of those outcomes. Analysts can deal with this by framing the outcome with a range of uncertainty. Fewer shots or games mean more uncertainty. Ultimately, we can’t be too sure the outcome of a series reflects the ‘truth.’ Holtby could have come into the playoffs with his game in top condition and his vitals in the optimal range to deliver a clutch performance, but if a few Seth Jones’ shots bounced off of someone’s ass in game 3 or 4, the narrative is completely different. Drilling down further (tied in the 3rd period only, etc) only compounds the problem of insufficient sample size.

Is Winning a Skill?

It’s important to the scientific process that we assume our hypothesis is null then work to prove it with data. A ‘clutchness’ factor is no different, we should assume it doesn’t exist. It might not exist as a differentiator at the NHL-level for good reason, a propensity to fold in critical moments would likely prevent them making it.

However, this doesn’t feel right. I’ve played with the pressure of losing the game 1-0, and it’s certainly easier than winning 2-1. Goaltending can easily be the equalizer between a dominant team and a dominant win, possibly even flipping the script to a loss. Goal prevention is the best way for a goalie to win games. However, it’s possible that some goalies might be consistently better in crunch time than their goal prevention would suggest.

Regardless whether you think being clutch might be an innate skill some have or whether those differences are incredibly tiny at the NHL-level, we have to acknowledge that the finite and imperfect nature of the data will likely be a limiting factor.

What Does the Data Look Like?

The objective of this analysis is to explain goaltender playoff performances using data available prior to round 1, game 1. The target of interest is playoff goal prevention per shot, save % less expected save %. If a goaltender faced 25 expected goals on 250 shots, but only conceded 20 actual goals, this would be a 2% lift (5 / 250 or 92% – 90%). Actual save % may deviate wildly from expected save % in small sample sizes like the playoffs. A few bad goals and/or unlucky bounces against will likely prevent a chance of redemption.

To help explain the selected measure of playoff performance for each season, the save % lift can be calculated for:

• the regular season performance prior to that playoffs
• entire career regular season performance prior to that playoffs
• entire career playoff performance prior to that playoffs
• a proxy for goaltender workload at the onset of the playoffs

Visualizing the relationship between the save % differences we see a small relationship and correlation between each. As predicted, the variance in playoff results (y-axis) is higher than the explanatory variables (x-axis) with a higher sample size. Initially, it appears regular season results are most correlated with playoff success (a perfect correlation would be equal to 1 with each point falling along the grey diagonal line). Career regular season results have the least variance and lowest correlation.

Do these any or all of these metrics matter when explaining playoff performance?

The Weight of the Playoffs

In order to understand how each of the explanatory inputs matter we can use a multiple linear regression. This helps us quantify the direction and strength of the relationship between the explanatory variables and playoff performance.

Running a regression of 122 goalie-seasons facing at least 100 shots in the playoffs and 1000 shots in the respective regular season results in the model below.

 Variable Coefficient Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0011 0.00 0.253 0.801 Career Playoffs Sv% Lift 0.2176 0.11 1.987 0.0494* Regular Season Sv% Lift 0.8485 0.27 3.15 0.002** Career Regular Season Sv% Lift 0.1203 0.33 0.37 0.712 Weighted Shots in 15 Day Window Prior Playoffs 0.0000 0.00 -1.131 0.261 Playoff Rookie 0.0004 0.00 0.089 0.929 — Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.01536 on 124 degrees of freedom Multiple R-squared:  0.1342, Adjusted R-squared:  0.09341 F-statistic: 4.292 on 4 and 124 DF,  p-value: 0.002752

Notably, this is a pretty weak model, confirming the intuition that playoff performance is tough to explain. But directionally, regular season results are more significant and the coefficient is larger than career playoff results. Also noteworthy, career regular season results have no significant effect (though directionally positive) on the playoff results once the current season and career playoff results are controlled for. Workload has a no significant effect, though directionally negative. Being a playoff rookie also has no effect, but is directionally negative too.

Formula For Success

Dropping the insignificant variables and re-running the regression creates the formula below to (loosely) calculate the expected playoff results.

So, for example, Holtby entered the 2018 playoffs with a save % lift of +1.14% in prior playoffs, but only -0.18% in an uncharacteristically mediocre regular season. The regular season results are weighted about 4 times more important in the formula, resulting in an expected -0.2% save % lift in these playoffs, which he’s exceeded to date.

Bobrovsky’s prior playoff results (-2.27%) pulled down his regular season results (+0.5%), expecting a -0.33% performance. He finished with a -1.22%.

Despite a great regular season, Grubauer’s expected save % was about 0% due to poor prior playoff appearances pulling it down.

Summary

If there’s anything to take away from this analysis is that explaining playoff performances is difficult. This was likely obvious to anyone who’s watched playoff hockey. Small sample sizes, survivor bias, and out of control narratives, playoff hockey has everything to confound a good analysis.

That said, some things do matter directionally. Entering the playoffs after a good regular season is probably more important than a good playoff track record. Braden Holtby may have bucked this trend playoffs-to-date, but it was probably more likely his regular season results were lower than his true talent suggests.

The results also suggest that waiting for a goalie’s playoff results to regress to a career average is generally fruitless. This makes sense intuitively, a goaltender may change teams and systems. They develop and regress. Regular season results likely give enough of a snapshot of where their game is at that entire career regular season results are unnecessary. Marc-Andre Fleury entered the 2018 playoffs with excellent regular season results, average career regular season results, and below average playoff results. This was a recipe for success based on the basic model (expected +0.5%, chart above) and he’s subsequently delivered with excellent results (he currently has the best save % lift of goalies with over 500 shots in the dataset going back to 2011).

With all of these considerations, there is nothing to suggest a goalie can simply turn it on for the playoffs. Proven experience certainly helps, but it’s more important to have posted good results with the most current team and defensive conditions.

Washington Re-Visited

Was Trotz right to start Grubauer? Probably. Playoff series are short and Grubauer had played excellent during the regular season. However, past playoff results do have a partial explanatory effect, partly because there are other considerations in the playoffs. Playing styles can change, physicality around the net can increase, and facing a well game-planned opposition for 4 to 7 games means that tendencies and tempers can amplify. Holtby had experience in those situations, not enough to completely offset the difference in their regular season, but close.

Bobrovsky can take comfort in the fact that his playoff results should have been better than they turned out this season. There’s likely no use in him re-visiting these playoff letdowns, his best bet is to look forward, focusing on another big season and carrying that performance forward. Either the results will come naturally or maybe he will be carried up by some positive unexplained variance.

Thanks for reading! I update goalie-season data using expected goals, it can be downloaded or viewed in my goalie compare app. Any custom requests ping me at @crowdscoutsprts or cole92anderson@gmail.com.

Code for this analysis was built off a scraper built by @36Hobbit which can be found at github.com/HarryShomer/Hockey-Scraper.

I also implement shot location adjustment outlined by Schuckers and Curro and adapted by @OilersNerdAlert. Any implementation issues are my fault.

Analyzing the Impact of the Reverse VH Tactic

Goaltending tactics have evolved considerably in the last 30 years, confirmed by rising save percentages. The Reverse Vertical Horizontal (RVH) is a relatively new goaltender tactic, now widely-adopted. Is there a meaningful impact in the data?

What is the RVH?

Growing up almost every coach I had wanted me to stand-up ‘more’ – more being a relative term. Most made peace with the fact that I was going to try to make the same type of saves as Patrick Roy or Dominik Hasek but since I was still a kid, it would probably help if stood up once in a while. Still, they had to choose their battles wisely, most picked the same hill to die on – bad angle shots. It was simple geometry really, with the right stick positioning an adolescent goalie could stand there and cover 100% of the net. However, this just led to the terrible experience of having people hack away at your feet waiting for either a goal or a teammate to save you – glued to the post you couldn’t cover the puck and dropping to your knees with the puck that tight would create a hole anybody could hit.

By the time I got to junior and had a goalie coach we worked in the Vertical-Horizontal (VH) tactic to deal with shots from sharp angles. The short-side pad would seal the post (vertical) and the back leg would drop sealing the ice (horizontal). There was always a risk of getting your stick tied up and/or getting beat between the post and skate, but used properly it was pretty tough to beat from range, however, there were trade-offs. Leading with the pad tied up the hands a bit, meaning rebounds were more difficult to control. If there was a rebound the VH was configured to push off the post, but only in one direction. If you had kept your knee tight to the goal line, but needed to push to the top of the crease, too bad, you were pushing across the goal line.

The Reverse Vertical Horizontal (RVH) flipped the configuration of the pads, so the strong pad seals the ice (horizontal) and back leg remains anchored (vertical), freeing up the hands and stick more to make plays and allowing rotation with the back leg and push off with the post leg (I would have never dreamed of this, most nets growing up were easy to knock off whenever you needed a convenient whistle). The back leg can anchor or drop into a butterfly quickly which gives the RVH more flexibility when repelling a play originating from a sharp angle compared to the VH.

This added flexibility has meant RVH has mostly supplanted VH as a tactic for sharp angle shots, but it’s not perfect either since it leaves a few holes along the post above the pad, particularly over the shoulder. Additionally, because of its flexibility, some goaltender’s become too reliant on it, defaulting to it prematurely or in situations that don’t call for it. Shooters are also able to pick up on trends. After all, throughout the VH and RVH it was always an option to play sharp angle shots more passively by standing up as long as possible (perhaps anticipating a pass or change of angle) or more aggressively by moving off the post and squaring up. The RVH is a great tactic, but it’s up to the goalie to assess the shooter speed, handedness, passing options, and defensive support and making a read rather than simply defaulting to the RVH.

What does the data say?

As early as 2014, InGoal Magazine’s Greg Balloch discussed the RVH being over-used situationally and improperly, including at the NHL level. You don’t need to watch too many highlights to see someone who’s 6’5″ inexplicitly getting beat over the shoulder from a bad angle shot because they were leaning on the post in the RVH. Is this is a growing problem or just some unfortunate anecdotes and the benefits outweigh the negatives?

Looking at NHL play-by-play from 2010-2018, we can isolate shots where the RVH has been presumably been used properly and possibly improperly to see if there are any patterns in the:

1. Share of shots resulting in goals (obvious why this matters)
2. Number of shots attempted per game (perhaps RVH has encouraged or discouraged bad angle shots)
3. Share of shots resulting in rebounds (are some tactics more prone to rebounds than others)
4. Shooting % on rebounds, or calculated expected goals on rebounds if the sample size is too small (are some tactic more prone to bad rebounds that are more likely to be converted in goals or possibly leave the goalie less likely to make the rebound save)

Observing these metrics over the last 8 seasons might reveal a meaningful change in success rates, but it important to caution that while this might appear to be a testable tactic, in a complex game like hockey, effects can be hard to pin down. We don’t have passing data to reveal if, for example, a more aggressive tactic led to more passing from the sharp angle and consequently more dangerous locations, though the number of attempts per game might lend a hint.

Either way, it’s possible macro trends don’t reveal anything meaningful since there’s much that unobserved and the data itself is imperfect (though the coordinate data has been adjusted to hopefully improve the accuracy of shot location). That said there may be potentially meaningful and interesting information in the data that might inform a more concentrated deep-dive later.

What does the data look like?

For this analysis, we will focus on bad angle shots where a goalie might select the RVH tactic, either properly or improperly. To do this we can limit to shots taken from a 45° angle or less and within 10 feet from the goal line (visualized below). Further, we will want to breakout combinations of:

1. ‘Close’ vs ‘Long’ Shots – using the cut-off of 12 feet from the net, look at how goalies have dealt with shots where they wouldn’t have time to react, and compare to longer shots.
2. ‘Poor Angle’ vs ‘Decent Angle’ Shots – the RVH is generally recommended on poor angle shots (0°-22.5° from the goal line) but could be over-used when the puck is at a decent angle(22.5°-45°).

Identifying these combinations of angle types for analysis can be visualized on a rink (cumulative shooting percentage labelled). The average shooting percentage across all shots is about 6.6%, so shots closer than 12 feet from a poor angle are about as dangerous as the average shot while getting a few feet out to a decent angle improves the shooting percentage by 2%. Another important consideration is that I crudely bucketed this data, which is generally not ideal, but for the purpose of the analysis helpful (. The coordinate data itself isn’t perfect either, but some home-rink bias adjustment has been applied, so hopefully won’t be systemically biased across zones or time.

Trends By Season

A quick note about the charts below. They focus on shots at 5v5 and 5v4 play since the distribution of the type of shots we might see from each of the zones above would be different on a 5v4 or 4v4. On a 5v4, we’d expect to see more one-timers as a share of total shots from these zones, increasing the expected shooting percentage and might be the result of changing powerplay tactics. Shots while gameplay is 4v4 or 3v3 are also more likely to be dangerous, if a shooter shoots from a sharp angle in 3v3 overtime, for example, it’s probably because they expect to score.

We must also deal with both signal and noise in the data, are fluctuations in shooting percentage caused by anything material or just randomness? Our default assumption is that the RVH likely hasn’t had any impact on bad angle shots and the burden of proof would be on the analysis to discover a statistically significant difference in the data. Ideally, we’d have some sort of intervention period where all NHL goalies adopted the RVH. Unfortunately, this would never be the case, so we can only observe loose trends over time at the macro-level.

Without a clear way to compare a “before and after” period for all goalies, we can create uncertainty bars for each season by considering sample size. Say we had observed 10 goals on 100 shots from a particular area, we wouldn’t be too sure in that 10% shooting percentage, a post here or there, it may have been 6% or 14%. What if we observed 100 goals on 1,000 from the same spot? We can be increasingly sure in that 10%. To reflect the impact sample size has on certainty, the analysis will use the standard deviation of beta distribution to convey uncertainty by using error bars +/- 1 standard deviation.

Shooting Percentage

The primary job of a save or tactic selection is to stop the puck, so naturally, the first trend to look at is shooting percentage from each segment of the ice.

Starting with 5v4 shots, the first trend that jumps out is the rise in shooting % on shots within 12 feet up until after the lockout-shortened season, falling dramatically, and then slowly rising again. This presumably reflects a cat and mouse game between shooters and goalies, but may also involve powerplay and penalty kill strategies countering each other. Shooting percentages from longer bad angle shots followed a more muted version of this trend.

At 5v5, trends are less pronounced. Interestingly, shooting percentage on close shots from a poor angle jump above those from a decent angle in 2014-15, which is strange, before normalizing again.

Rebound Percentage

It’s also important that the goalie prevents rebounds on bad angle shots. Rebounds are a bit tricky to define in the play-by-play data but can be estimated by flagging any follow-up shot with 2 seconds of the original.

Looking at just 5v5, rebounds have been generally less prevalent than the average shot (3.4% of shots in 2017-18 resulted in rebounds), but has been trending upward in the 4 areas of the ice. It’s tough to infer a definite trend because we have some more uncertainty (rebounds are rarer than goals) but it seems the rebound rate is not falling.

Rebound Shooting Percentage

Rebounds are a problem because they are very dangerous, they are converted to goals about a quarter of the time, 4 times as dangerous as a non-rebound shot. Are rebounds on shots from poor angles getting more dangerous?

At 5v5, two things are apparent. Rebounds from poor angles are generally ‘safer’ than average, the shooting percentage has about 10% lower than average. This suggests goalies have generally done a good job of keep rebounds on the strong side, preventing pucks from getting to the middle of the ice or weak-side and more dangerous extra chances against.

Secondly, because we are dealing with a fraction of a fraction our sample size is quite small and the error bars are large.

Alternatively, we could look at the expected goal value of the rebounds to reclaim some of this sample, where we can calculate factors such as the total distance the puck travels between shots and the angular velocity the goalie might have to deal with.

Both of these views suggest that there isn’t really a definitive trend since we are working with 3% of the original data (already limited to bad angle shots) making the results pretty noisy. An interesting finding is that rebounds on shots from poor angles can be more dangerous on shots from slightly better angles, possibly due to goalies not being square to the initial shot in these cases.

Bad Angle Attempts as Share of Total

It’s also important to check to see if shooters are attempting more shots from bad angles as a share of total shots. This might be the result of defensive pressure, but it also might signal shooter’s seeing and testing holes.

It appears most 5v4’s are moving away from bad angle shots, notably on shots over 12 feet. However, at 5v5 there had been a modest increase in attempted bad angle shots from further than 12 feet away (until this season). Some players are definitely happy to test goalies from the seemingly impossible angle – and why not? They don’t have to chase down the puck if they miss.

Shooter Handedness

We can also look at shooter handedness and how it has impact goal, rebound, and attempt rates over time. Generally, shooters are more trigger happy when they are on their strong side (meaning the shooter’s stick blade is closer to the centre of the ice, if they are on their forehand), though success rates for shooters on their weak side are in the same range. Shooters have become less successful on their weak side on the powerplay, but attempts haven’t fallen considerably to reflect this.

Goaltender Specific Trends

These general trends might have some interesting nuggets and reveal things we might want to explore further, but they can’t reveal much regarding tactical usage of the RVH because each goaltender will have implemented it at different times, if at all. While it would be nice to have a definitive list of when goalies might have adopted the RVH tactic that might be a little simplistic. Early adopters might have had an advantage since shooters hadn’t picked up on its relative weakness. It’s also possible (and pertinent to our analysis) that goalies have become over-reliant on it in more recent seasons, defaulting to it in improper situations which could have an undesirable effect.

Without a completely clean solution, one way to test individual goaltender effectiveness from poor angle shots is to treat each off-season (where tactical changes would normally be implemented) as a divider between a ‘before’ and ‘after’ period. We can calculate save percentage in each period and compare them, testing for statistical significance in each sample. Where the save percentage has a statistically significant difference from the before to after period it might draw interest and warrant a deep look. Was there a tactic change or something else causing a meaningful change in results?

For this part of the analysis, we will limit to the 24 goaltenders that have faced at least 100 bad angle shots a season in at least 5 of the 8 seasons we have data for. Each off-season will be treated as a ‘split’ or intervention period. We’ll only focus on save percentage since rebounds are rarer, making the task of finding meaningful differences tougher.

Saga Bobrovsky

Sergei Bobrovsky had quite the change in the 2012 off-season going from the Philadelphia Flyers to the Columbus Blue Jackets (along with some time in the KHL waiting for the lockout to end), and ultimately winning the 2013 Vezina Trophy. In Columbus, he also had a new goalie coach, Ian Clark (I both attended and worked at Ian’s goalie schools in the past, full disclosure). Among other things, Clark helped Bobrovsky implement the RVH. If we look at Bobrovsky’s 5v5 save percentage on all bad angle shots 2010 – 2012 and compare it to 2013 – 2018 is it materially different?

In the ‘before’ period, Bobrovsky allowed 14 goals on 194, for a 92.8% success rate. Since then, he’s conceded 25 goals on 743 shots, his save % rising to 96.6%. If we run a test of statistical significance to check to see if there is enough evidence (shots) to determine that these proportional are meaningfully different, we get a p-value of 0.029. Stated otherwise, this difference would happen by chance alone about 2.9% of the time (showing my work below).

2-sample test for equality of proportions with continuity correction data: c(180, 718) out of c(194, 743) X-squared = 4.7966, df = 1, p-value = 0.02852 alternative hypothesis: two.sided 95 percent confidence interval: -0.080419509 0.003384363 sample estimates: prop 1 prop 2 0.9278351 0.9663526

In the soft sciences, convention suggests the ‘cutoff’ for statistical significance is a p-value is 0.05, so we can say with some certainty that this difference is likely not due to chance. We can never be sure from the data alone, but it seems that it’s likely that some combination of the move to Columbus, new goalie coach, and adoption of the RVH probably had a positive effect on his save percentage from bad angles.

Complete Goalie Splits and Save Percentage Results

We can do the same thing with Bobrovsky’s other 6 off-seasons and all 163 unique goalie-offseason splits. The results below will label any goalie-offseason where the p-value is less than 0.05. Goalies that experienced a significant change received their own color, the rest of the 24 qualifying goalies are represented by the ‘Other’ green.

Holtby, Rask, Elliott, Miller, and Bobrovsky all saw a notable rise in save percentage occurring somewhere between 2011 and 2014. Some of this might be attributable to tactical changes, though without talking to goalies, their coaches, and/or grinding on video we can’t necessarily assert for sure. However, it’s possible that RVH adoption helped drive some of this effect.

Luongo, Varlamov, and Price have all experienced a notable drop in success rate. Luongo stands out because he likely adopted the RVH in 2013, but saw a drop in the 2014 split.

Price particularly struggled this past season on shots >12 feet and <= 22.5°. He gave up more goals from that area last season (5) than from 2010-2017 (4). This helps identifies a particular pain-point in Price’s poor season. Bad angle goals are easily preventable, and going back to the tape would help identify if tactics, luck, and/or laziness were at fault, which can help inform the proper adjustments.

If you are stricter and insist on a p-value of 0.01, only 2012 Brian Elliott in the ‘Close-Decent Angle’ area, 2012 Jonathan Quick in the ‘Long-Decent Angle’ area and aforementioned 2017 Price in the ‘Long-Poor Angle’ area saw significant changes at that level.

A weakness of this analysis is the ‘bucketing’ of data into specific areas, so it is possible a lot of borderline goals or shots from one area ended up in other area or another by chance. Unlikely, but something to keep in mind.

Summary

Capturing the full impact of the RVH is a near impossible task since we don’t observe when it is actually deployed. But we can look at a proxy for when it might be deployed and investigate if there were any meaningful impact on results. While incomplete, it might help ask smarter questions and help concentrate the proper video application. Carey Price struggled from poor angles last season? Those clips can be isolated and analyzed to re-affirm the trend and possibly reveal why.

Analyzing tactical usage in hockey is often frustrating since everyone on the ice is basically playing a complicated version of rock-paper-scissors on skates trying to gain an advantage. We can observe goals, rebounds (kind of), and total attempts, but what if shooters have adapted by making more passes that lead to even more dangerous shots? There are rarely clean test and control cases we can use to attribute some change in results to a specific tactic.

We can, however, attempt to use data to help guide a more informed approach and use the framework above to begin to create and explore additional questions. Often looking from just a video perspective misses part of the equation. If you looked at all bad angle goals against when goalies where using the VH and compared to the RVH, you wouldn’t have the complete picture. You want to look at all bad angle shots using each tactic then look at the success rate of each. Of course, we can’t do that easily, so identifying proxies and exploring the data can help paint a more comprehensive view and sharpen the focus on where meaningful differences may exist.

Goaltending can frustrate fans and coaches alike because results from game to game can be inconsistent. Goalies can’t necessarily dictate the game, rather have to let the game come to them while employing tactics that give them the best chance to succeed – ‘playing the percentages.’ The evolution of goaltending tactics has largely been positive, as save percentages suggest. It appears the RVH has probably helped goaltenders deal with bad angle attacks, but this isn’t a one-way effect. There is evidence to suggest that rebound rates are rising and some goaltenders have had notable falls in save percentage from poor angles. Shooters will always adapt, so it’s important for goaltenders to critically assess the tactics they employ and continue to stay a step ahead.

Thanks for reading! A notebook with code for the analysis can be found here. Any custom requests ping me at @crowdscoutsprts or cole92anderson@gmail.com.

Code for this analysis was built off a scraper built by @36Hobbit which can be found at github.com/HarryShomer/Hockey-Scraper. I also implement shot location adjustment outlined by Schuckers and Curro and adapted by @OilersNerdAlert. Any implementation issues are my fault. The rink plot is adapted from @iyer_prashanth code.

My code for this and other analyses can be found on my Github.

The Stanley Cup Playoffs magnifies all the frustrations people have with the goaltending position. Tough to predict and performances with an outsized impact on game outcomes, what can you expect over a playoff series?

Goaltending is a volatile position. With so much out of the goaltender’s explicit control, it’s extremely difficult to consistently deliver positive results. This can be true over the course of a season, but it is especially true over the course of a playoff series. Generally starting goaltenders in the playoffs have had good seasons, so at the margins, there isn’t much separation between most starters, usually not enough to predictably manifest itself over 4 to 7 games.

But this helps frame the paradox of goaltending. Tough to project, but few positions have more control of the outcome of the game. Looking at 2017-18 Wins Above Replacement, thanks to Corsica Hockey, 11 of the top 30 contributors were goaltenders. That’s also in aggregate too, goaltenders don’t play every game of the regular season like they normally do in the playoffs – when normalizing by games played, goalies make up the majority of the top 30 (17) and 10 rank above the most impactful skater, Connor McDavid.

So to the cynical and casual fan alike, the playoffs can simply appear to be a competition in waiting to see which goaltender gets hot at the right time. And endless frustration and soul-searching when the opposite happens.

What’s the best (healthiest) way to think about what you’ll get from your goalie in the playoffs?

Taking It One Game At a Time (TIOGAT)

Most starting goalies in the playoffs have a pretty good body of work during the regular season. Some good games, some bad, but probably more good than bad if their team made the playoffs. We can calculate a game-level performance by taking the difference between the actual goals against and expected goals, the number of goals an average goalie historically would concede given what we know about those shots against, (adjusting for rebounds, which goalies have some control over) and normalizing by total shots (or a percentage between actual and expected, sometimes referred to deltaSv% or Save % Lift Over Expected). So, if a goalie faced 50 shots against, totalling 4 expected goals, but only gave up 3, that game would have been 1 goal prevented on 50 shots, or 2 per 100 shots (2% better than expected). It’s also important to note that, unlike save %, expected goals attempts to weights shots by situation, so a 5v5 shot and 5v3 shot can be each compared to their relative, historical probabilities of a being a goal (though not perfect). Using all-situation results, opposed to just even-strength, creates a more reliable metric.

We can use a histogram to visualize the distribution of John Gibson’s 2017-18 performances, where one game is placed into each bin. I highlighted his first 2 playoff performances (as of 4/15/2018). About 63% of the time he made more saves than an average goalie would (a positive Sv% Lift Over Expected). In his 2 playoff games, he’s had 1 game where his Sv% was the same as we’d expect from an average goalie, given what we can quantify about shot quality against and another about 1.7% better than expected ((3.96 xG – 3 GA) / 57 shot attempts).

However, having to bin each game is a little awkward and to compare across multiple goalies the y-axis might need some scaling. Since most playoff goalies have 50 plus games this season, we can smooth and scale the distribution using a density curve, showing the probability of each outcome, without loosing too much information. Doing this smooths over Gibson’s lack of games with only slightly (~1-2%) better than expected results, which is partially strangeness, partially a result of binning, since he missed plenty to the left and right.

Shuffling the Deck

Armed with game-level performances for each goalie, we turn to the playoffs where each game is critical. We can use each goalie’s regular season results (partially attributed to team defensive performance) as a template of what to expect in the playoffs. Think of it as a deck of cards: we draw one card for a deck of that goalie’s performances and place it on the table. Do that again for the opposing goalie and their results. What does that look like after 4 to 7 games? What is the probability your goalie outplays the opposing goalie in a series?

With game-level performances, we can attempt to answer that. Below are Connor Hellebuyck’s and Devan Dubnyk’s regular season performances. When Dubnyk was good he was about as good as Hellebuyck, but when he was bad he was worse. In sum, Hellebuyck was better than Dubnyk on the year.

We can play this like you would the card game ‘War’ (with replacement, meaning a card or game goes back into the pile randomly and can be picked again). Tracking who ‘wins’ and by about how much a few thousand times, we can figure out what percentage of the time we might expect Hellebuyck to outplay Dubnyk, or vice versa.

Using Hellebuyck’s and Dubnyk’s results, Hellebuyck outplays Dubnyk about 57% of the time. In a short series we likely wouldn’t notice the difference, and it’s entirely possible Dubnyk outplays Hellebuyck (as I’m writing this, that appears to be the case in Game 3), but in a game where the marginal probability of winning is small and possibly with an upper bound of 62% accuracy, this is probably a welcome advantage to Winnipeg. I’m assuming the management and other’s with skin in the game would be interested in that edge.

Looking across all series we can calculate the same probability. We can also overlay the 1 or 2 playoff games performances over the distribution of season results. We can see Matt Murray and Brian Elliott oscillate between their season’s best and worst and Frederik Andersen pull a card he didn’t even know he had.

What’s likely and what actually happens are 2 different things. But it helps to understand how likely something is, which can give some important context to the results a game or two, even if those results might put your season in peril.

Note: I will be updating this plot during the playoffs on my twitter account (not in low-quality gif format too).

There are a few assumptions to address with this analysis:

• The Black Swan Game – Just because something isn’t in the data doesn’t mean it can’t happen. In game 2 against Boston, Frederik Andersen was pulled 12 minutes into the game, posting a save% 40% below expected, which he hadn’t done during the season. Part of this is artificial, he likely would have worked himself back into something less extreme by finishing the game. However, other games during the regular season where Andersen or any other goalie was pulled, would functionally look the same, whether it was -40% or -20%. A loss is quite likely. We’re more interested in: how often do they shit the bed?
• Independence of Sampling – This also assumes goalies compartmentalize game performances, opposed to some sort of lagged effect of a bad game leading to a higher probability of a bad game the next time out. In the playoffs, it certainly feels this way, because if you draw 2 – 3 bad games in a row, that’s usually the end of the season. However, in aggregate the last game has little effect on the current game. Even controlling for workload, a simple linear model found no effect for one game to another. Each season for the 16 starters looks pretty flat.

However, this is still a little naive. Confidence, health, team play (with or without your best players) might mean some stretches are more favourable than others and have less relevance to the series or game at hand. Additionally, matching up against a single team might result in some otherwise minor details to be exploited, perhaps creating an even wider distribution of outcomes (to the frustration of all).

• The Playoffs and Regular Season are Comparable – Do teams really tighten up defensively in the playoffs? Do goalies generally step up and play better? Maybe, and if so it would be unwise to sample from regular season games, where shots were more likely to be dangerous due to pre-shot passing plays or screen, and the goalie hadn’t really locked in yet. While goals actually do come a little harder in the playoffs (about 1 less goal than expected per 380 shots, or about 12 games), some of this is because the remaining goalies are, almost by definition, getting good results. Comparing goalie-season’s regular season results to playoff results, there’s generally no lift from regular season performance to playoffs, but stronger goalies are likely make it a little further.

On the other side of the ledger, for every goalie that is perceived to raise their game in the playoffs, another will struggle, due to some combination of luck, health, and psychology, but they don’t last long in the sample.

The Estimated Save Percentage Index Model

The most common metric used to measure goaltending performance is save percentage, the number of saves as a percentage of total shots on goal. This metric is fundamentally flawed. To more accurately understand the quality of a particular goaltender, save percentage must be more sophisticated. This is possible because the goaltending position has two important prerequisites that make performance the most quantifiable in hockey. First, the result is absolute: any shot on goal is either stopped or results in a goal. Second, the position is passive: the difficulty to the goaltender is generally dictated by the game in front of him, except for rebound control and puck handling, which can be addressed later in the model.

The Expected Save Percentage (ES% Index) is a predictor of a goaltenders success based on a number of inputs that assigns the individual difficulty of each shot the goaltender faces. The inputs used in the model are shot location, puck visibility, and the rate at which the puck changes angle before or during the shot. The model assumes the goaltender has NHL quality blocking-width, positioning, lateral movement, and reflexes. Then, through an array of formulas, the model determines the expected save percentage for each shot on goal given the inputs. Once these expected save percentages are aggregated over a game, or over a season, we can see how the goaltender’s actual save percentage compares with the expected save percentage and compare them to their peers. The best goaltenders will consistently exceed the predicted save percentage whether they are facing 20 high quality shots or 40 lower quality shots. The Expected Save Percentage Index—the difference between real save percentage and expected save percentage—will measure the proficiency of the goaltender. The index can be tracked game-by-game and season-by-season. Since we are removing much of the fluctuation in team performance we will have a much better idea of a goaltender’s consistency—an attribute critical to NHL success that can be lost in the potentially misleading statistics that are currently employed.

The inputs have been selected for simplicity and versatility. The most obvious is shot location—the closer the shot, the more likely it will be a goal. Assuming the average NHL shot is about 90 miles/hour and a NHL goaltender has a reaction time of .11 seconds, the Expected Save Percentage increases greatly once the shot is from a distance of greater than 15 feet.  Inside of 15 feet it assumes the goaltender can cover around 70%- 80% of the net through size and positioning, and the distance model reflects this assumption. Location can also allow the model to determine the shot angle and net available to the shooter, two other factors that are automatically worked into the model. If applicable, visibility is a binary input determining whether the goaltender has a chance to see the puck. Again, since we are assuming NHL quality goaltending, there is no ‘half-screen’ or ‘distraction.’ If the goaltender has an opportunity to see the puck, they are expected to gain a sightline to the puck. If they are completely screened, the expected save percentage is lowered as a function of the net available when the shot is taken—the better angle, the more dangerous the screen. Lastly, the model factors in the rate of the change in the angle of the puck when the shot as taken, if applicable. This way we can discount the expected save percentage if the shot is a one-timer, deke, passing play, or even a deflection to better reflect the difficulty of a shot against. The model assumes NHL quality lateral movement, edge control, and post save recovery. At lower levels, where puck movement is slower, goaltenders will have to put up higher real save percentages to maintain an ES% Index that predicts NHL skills.

These inputs create an admittedly arbitrary, yet sophisticated, expected save percentage. The formulas can be retrofitted as more data is collected to move closer to a universally accurate expected save percentage—ideally the median ES% Index would be 0. The data can be then broken up into three categories, shots with no screen or movement, shots that are screened, and shots where the puck is moving laterally as it is released. Breaking each shot into individual components will make it possible to track and eventually acquire objective data, replacing the placeholder formulas with actual NHL results. However, as it stands now, the expected save percentage is a benchmark, and it is the discrepancy between the realized and expected save percentage that will be the true measure of individual performance. Shot placement may seem like a troublesome omission from the model, however since the model is built on aggregated averages we can account for the complete distribution of shots put on net. NHL quality defense generally takes away time and space from shooters, limiting their ability to place the puck wherever they desire. Teams are not necessarily inclined to giving up shots in a particular place in the net, but weaker teams are prone to giving up shots from more dangerous locations on the ice. In this way shot placement is indirectly built into the expected save percentage: a shot from 10 feet out the shooter has a much greater chance of hitting a target, say high glove, than a shot from 20 feet.

Win Contribution

The ES% Index measures goaltender performance in a vacuum, comparing actual performance to how we would expect him to perform in a given situation. However, the goaltender can influence the amount of shots they face through rebound control and effective puck handling. Tracking these occurrences will allow the model to adjust the expected save percentage further. Easier than average shots that result in a rebound will lead to the successive shot not being factored into the model. This is analogous to saying the resulting shot should not have happened. Difficult shots that result in rebounds will take into consideration the difficulty of both shots when assigning expected save percentage to the potentially ‘preventable’ rebound shot. Whenever a goaltender handles the puck and it results in the puck directly clearing the zone, it will be assume the goaltender prevented a shot a certain percentage of the time. By adding the potential shots and removing preventable shots to the actual shot total we will have a good idea of how the goaltender is helping their team and influencing the game.

With the expected save percentage and expected shots against, we can manufacture an expected goals against for each game. We can compare expected goals against to the goal support the goaltender received and determine whether or not the goaltender should have won the game. If the game should have been won based on the actual goals for and expected goals against, but was not, this will be a contributed loss. Conversely, if it was predicted the team should have lost, yet won, this will be a contributed win. So we can remove the bias toward goaltenders on bad teams—who have more opportunity to register contributed wins—we can measure the number of potential contributed wins and losses and compare them to the actual contributed wins and losses.

How does this model predict future goaltending performance?

This analysis allows an NHL team to gain a concise, quantified measurement of goaltending performance across leagues and time. It will more accurately identify goaltending proficiency and consistency. It can be adjusted from league to league as the goaltender advances and will better predict future success as the database grows. The model automatically assumes each goaltender has NHL size, speed, and positioning, so if the goaltender can consistently perform better than his peers, then they will likely continue to outperform them at higher levels. This can apply to a late round pick playing on a weak team in Europe or a college goaltender discredited for being on a strong defensive team. Since the ES% Index can be broken into components—stationary shots, screened shots, and moving shots—it will be easy to identify weaknesses that may be hidden by a specific team. For example, a goalie with poor lateral movement on a team that limits puck movement might perform well by traditional standards, but if the ES% Index on shots with puck movement is below average, chances are they will be exposed at the next level. There is a very real advantage to employing increasingly accurate goaltending metrics that other teams are not using to value goaltenders. It can also be broken up into individual components lending itself to the in-depth analysis of goaltending prospects, opposition goaltenders, and even the performance of other players on the ice. While the ES% Index will likely have limitations, predicting the development and value of goaltenders has not improved during an era when the quality of goaltending has increased dramatically. Therefore, a more accurate metric will almost certainly improve the valuation of each goaltender and offer critical insights into their development.

Other Considerations

While advanced goaltending metrics can aid management decisions, they can also lend coaches a helpful perspective when preparing for games. The objective ES% Index will help explain some of the volatility in goaltender performance. Coaches do not always understand the subtleties of the position, their only concern lies in the proficiency of the goaltender in preventing goals—exactly the intent of the ES% Index. It can also be used as a pre-scout for opposing goaltenders. Situational success rates for each NHL goalie are tracked through the season, offering a strategic advantage to the coaching staff and players. If an otherwise successful goaltender is performing below the norm on shots with puck movement, then this is a clear indication to move the puck before shooting. Ability can be judged based on data from an entire season rather than anecdotal observations. This is advantageous because the goaltending position is inconsistent by nature, one bad bounce or mental lapse can be the difference between a good game and a bad game. Watching a select few games of a goaltender will make it difficult to judge their true ability—no doubt part of the reason teams struggle to value goaltenders at the draft. It can also compliment scouting reports. If a scout sees a particular trend or weakness in a goaltenders game, there will be data available which can be used to verify or contradict the scout’s claims.

Additionally, goaltender performance can influence the statistics of players at other positions. Both a defenseman playing if front of poor goaltending and a goal scorer who faced an unlikely sequence of superb goaltending are going to have their statistics skewed. Adjusting these statistics for goaltending performance will give management a clearer idea of why a certain player’s statistics might be deviating from their expectations. For example, the model can be expanded to measure the difference between even-strength expected goals for and expected goals against for each player over the course of the game based on the data already being recorded. This type of analysis is separate from the ES% Index, however having more accurate goaltending statistics would provide an organization another tool properly evaluate players and put the absolute best product on the ice.

Conclusion

No statistical analysis can replace comprehensive subjective evaluation that is performed by the most experienced hockey minds in the world. However, it can offer a fresh perspective and lend objective analysis to a position where contrarians can often be the most successful. The unorthodox goaltending styles of Tim Thomas and Dominik Hasek have remarkably won 8 out of the last 17 Vezina trophies awarded. Not only were they drafted in the 9th and 10th rounds, respectively, they did not even become starting goaltenders until aged 32 and 29 despite their success outside of the NHL. Very few understood how they stopped the puck, but both men clearly prevented goals. It is my hope that employing more advanced goaltending metrics can remove the biases that exist and pinpoint goal prevention, the sole objective of a goaltender. Due to my extensive knowledge of the position as both a student and a coach, the model has been constructed to reflect the complex simplicity of the position—Where is shot from? Can I see it? Can I reach my optimal position?—while deducing the existence of attributes that are critical to NHL success: size, speed, positioning, lateral movement, and consistency. For these reasons, Expected Save Percentage Index and Win Contribution analysis manages to combine the qualitative and quantitative factors that are necessary to properly evaluate goaltenders, benefiting any team that employs these advanced metrics.

Expected Goals (xG), Uncertainty, and Bayesian Goalies

All xG model code can be found on GitHub.

Expected Goals (xG) Recipe

If you’re reading this, you’re likely familiar with the idea behind expected goals (xG), whether from soccer analytics, early work done by Alan RyderBrian MacDonald, or current models by DTMAboutHeart and Asmean, Corsica, Moneypuck, or things I’ve put up on Twitter. Each model attempts to create a probability of each shot being a goal (xG) given the shot’s attributes like shot location, strength, shot type, preceding events, shooter skill, etc. There are also private companies supplementing these features with additional data (most importantly pre-shot puck movement on non-rebound shots and some sort of traffic/sight-line metric) but this is not public or generated in the real-time so will not be discussed here.[1]

To assign a probability (between 0% and 100%) to each shot, most xG models likely use logistic regression – a workhorse in many industry response models. As you can imagine the critical aspect of an xG model, and any model, becomes feature generation – the practice of turning raw, unstructured data into useful explanatory variables. NHL play-by-play data requires plenty of preparation to properly train an xG model. I have made the following adjustments to date:

• Adjust for recorded shot distance bias in each rink. This is done by using a cumulative density function for shots taken in games where the team is away and apply that density function to the home rink in case their home scorer is biased. For example (with totally made up numbers), when Boston is on the road their games see 10% of shots within 5 feet of the goal, 20% of shots within 10 feet of the goal, etc. We can adjust the shot distance in their home rink to be the same since the biases of 29 data-recorders should be less than a single Boston data-recorder. If at home in Boston, 10% of the shots were within 10 feet of the goal, we might suspect that the scorer in Boston is systematically recording shots further away from the net than other rinks. We assume games with that team result in similar event coordinates both home and away and we can transform the home distribution to match the away distribution. Below demonstrates how distributions can differ between home and away games, highlighting the probable bias Boston and NY Rangers scorer that season and was adjusted for. Note we also don’t necessarily want to transform by an average, since the bias is not necessarily uniform throughout the spectrum of shot distances.
• Figure out what events lead up to the shot, what zone they took place in, and the time lapsed between these events and the eventual shot while ensuring stoppages in play are caught.
• Limit to just shots on goal. Misses include information, but like shot distance contain scorer bias. Some scorers are more likely to record a missed shot than others. Unlike shots where we have a recorded event, and it’s just biased, adjusting for misses would require ‘inventing’ occurrences in order to adjust biases in certain rinks, which seems dangerous. It’s best to ignore misses for now, particularly because the majority of my analysis focuses on goalies. Splitting the difference between misses caused by the goalie (perhaps through excellent positioning and reputation for not giving up pucks through the body) and those caused by recorder bias seems like a very difficult task. Shots on goal test the goalie directly hence will be the focus for now.
• Clean goalie and player names. Annoying but necessary – both James and Jimmy Howard make appearances in the data, and they are the same guy.
• Determine the strength of each team (powerplay for or against or if the goaltender is pulled for an extra attacker). There is a tradeoff here. The coefficients for the interaction of states (i.e. 5v4, 6v5, 4v3 model separately) pick up interesting interactions, but should significant instability from season to season. For example, 3v3 went from a penalty-box filled improbability to a common occurrence to finish overtime games. Alternatively, shooter strength and goalie strength can be model separately, this is more stable but less interesting.
• Determine the goaltender and shooter handedness and position from look-up tables.
• Determine which end of the ice and what coordinates (positive or negative) the home team is based, using recordings in any given period and rink-adjusting coordinates accordingly.
• Calculate shot distance and shot angle. Determine what side of the ice the shot is from, whether or not it is the shooters off-wing based on handedness.
• Tag shots as rushes or rebound, and if a rebound how far the puck travelled and the angular velocity of the puck from shot 1 to shot 2.
• Calculate ‘shooting talent’ – a regressed version of shooting percentage using the Kuder-Richardson Formula 21, employed the same way as in DTMAboutHeart and Asmean‘s xG model.

All of this is to say there is a lot going on under the hood, the results are reliant on the data being recorded, processed, adjusted, and calculated properly. Importantly, the cleaning and adjustments to the data will never be complete, only issues that haven’t been discovered or adjusted for yet. There is no perfect xG model, nor is it possible to create one from the publicly available data, so it is important to concede that there will be some errors, but the goal is to prevent systemic errors that might bias the model. But these models do add useful information regular shot attempt models cannot, creating results that are more robust and useful as we will see.

Current xG Model

The current xG model does not use all developed features. Some didn’t contain enough unique information, perhaps over-shadowed by other explanatory variables. Some might have been generated on sparse or inconsistent data. Hopefully, current features can be improved or new features created.

While the xG model will continue to be optimized to better maximize out of sample performance, the discussion below captures a snapshot of the model. All cleanly recorded shots from 2007 to present are included, randomly split into 10 folds. Each of the 10 folds were then used a testing dataset (checking to see if the model correctly predicted a goal or not by comparing it to actual goals) while the other 9 corresponding folders were used to train the model. In this way, all reported performance metrics consist of comparing model predictions on the unseen data in the testing dataset to what actually happened. This is known as k-fold cross-validation and is fairly common practice in data science.

When we rank-order the predicted xG from highest to lowest probability we can compare the share of goals that occur to shots ordered randomly. This gives us a gains chart, a graphic representation of the how well the model is at finding actual goals relative to selecting shots randomly. We can also calculate the Area Under the Curve (AUC), where 1 is a perfect model and 0.5 is a random model. Think of the random model in this case as shot attempt measurement, treating all shots as equally likely to be a goal. The xG model has an AUC of about 0.75, which is good, and safely in between perfect and random. The most dangerous 25% of shots as selected by the model make up about 60% of actual goals. While there’s irreducible error and model limitations, in practice it is an improvement over unweighted shot attempts and accumulates meaningful sample size quicker than goals for and against.

Hockey is also a zero-sum game. Goals (and expected goals) only matter relative to league average. Original iterations of the expected goal model built on a decade of data show that goals were becoming dearer compared to what was expected. Perhaps goaltenders were getting better, or league data-scorers were recording events to make things look harder than they were, or defensive structures were impacting the latent factors in the model or some combination of these explanations.

Without the means to properly separate these effects, each season receives it own weights for each factor. John McCool had originally discussed season-to-season instability of xG coefficients. Certainly this model contains some coefficient instability, particularly in the shot type variables. But overall these magnitudes adjust to equate each seasons xG to actual goals. Predicting a 2017-18 goal would require additional analysis and smartly weighting past models.

xG in Action

Every shot has a chance of going in, ranging from next to zero to close to certainty.  Each shot in the sample is there because the shooter believed there was some sort of benefit to shooting, rather than passing or dumping the puck, so we don’t see a bunch of shots from the far end of the rink, for example. xG then assigns a probability to each shot of being a goal, based on the explanatory variables generated from the NHL data – shot distance, shot angle, is the shot a rebound?, listed above.

Modeling each season separately, total season xG will be very close to actual goals. This also grades goaltenders on a curve against other goaltenders each season. If you are stopping 92% of shots, but others are stopping 93% of shots (assuming the same quality of shots) then you are on average costing your team a goal every 100 shots. This works out to about 7 points in the standings assuming a 2100 shot season workload and that an extra 3 goals against will cost a team 1 point in the standings. Using xG to measure goaltending performance makes sense because it puts each goalie on equal footing as far as what is expected, based on the information that is available.

We can normalize the number of goals prevented by the number of shots against to create a metric, Quality Rules Everything Around Me (QREAM), Expected Goals – Actual Goals per 100 Shots. Splitting each goalie season into random halves allows us to look at the correlation between the two halves. A metric that captures 100% skill would have a correlation of 1. If a goaltender prevented 1 goal every 100 shots, we would expect to see that hold up in each random split. A completely useless metric would have an intra-season correlation of 0, picking numbers out of a hat would re-create that result. With that frame of reference, intra-season correlations for QREAM are about 0.4 compared to about 0.3 for raw save percentage. Pucks bounce so we would never expect to see a correlation of 1, so this lift is considered to be useful and significant.[2]

Crudely, each goal prevented is worth about 1/3 of a point in the standings. Implying how many goals a goalie prevents compared to average allows us to compute how many points a goalie might create for or cost their team. However, a more sophisticated analysis might compare goal support the goalie receives to the expected goals faced (a bucketed version of that analysis can be found here). Using a win probability model the impact the goalie had on win or losing can be framed as actual wins versus expected.

Uncertainty

xG’s also are important because they begin to frame the uncertainty that goes along with goals, chance, and performance. What does the probability of a goal represent? Think of an expected goal as a coin weighted to represent the chance that shot is a goal. Historically, a shot from the blueline might end up a goal only 5% of the time. After 100 shots (or coin flips) will there be exactly 5 goals? Maybe, but maybe not. Same with a rebound from in tight to the net that has a probability of a goal equal to 50%. After 10 shots, we might not see 5 goals scored, like ‘expected.’ 5 goals is the most likely outcome, but anywhere from 0 to 10 is possible on only 10 shots (or coin flips).

We can see how actual goals and expected goals might deviate in small sample sizes, from game to game and even season to season. Luckily, we can use programs like R, Python, or Excel to simulate coin flips or expected goals. A goalie might face 1,000 shots in a season, giving up 90 goals. With historical data, each of those shots can be assigned a probability of a being a goal. If the average probability of a goal is 10%, we expect the goalie to give up 100 goals. But using xG, there are other possible outcomes. Simulating 1 season based on expected goals might result in 105 goals against. Another simulation might be 88 goals against. We can simulate these same shots 1,000 or 10,000 times to get a distribution of outcomes based on expected goals and compare it to the actual goals.

In our example, the goalie possibly prevented 10 goals on 1,000 shots (100 xGA – 90 actual GA). But they also may have prevented 20 or prevented 0. With expected goals and simulations, we can begin to visualize this uncertainty. As the sample size increases, the uncertainty decreases but never evaporates. Goaltending is a simple position, but the range of outcomes, particularly in small samples, can vary due to random chance regardless of performance. Results can vary due to performance (of the goalie, teammates, or opposition) as well, and since we only have one season that actually exists, separating the two is painful. Embracing the variance is helpful and expected goals help create that framework.

It is important to acknowledge that results do not necessarily reflect talent or future or past results. So it is important to incorporate uncertainty into how we think about measuring performance. Expected goal models and simulations can help.

Bayesian Analysis

Luckily, Bayesian analysis can also deal with weighting uncertainty and evidence. First, we set a prior –probability distribution of expected outcomes. Brian MacDonald used mean Even Strength Save Percentage as prior, the distribution of ESSV% of NHL goalies. We can do the same thing with Expected Save Percentage (shots – xG / shots), create a unique prior distribution of outcome for each goalie season depending on the quality of shots faced and the sample size we’ll like to see. Once the prior is set, evidence (saves in our case) is layered on to the prior creating a posterior outcome.

Imagine a goalie facing 100 shots to start their career and, remarkably, making 100 saves. They face 8 total xG against, so we can set the Prior Expected Save% as a distribution centered around 92%. The current evidence at this point is 100 saves on 100 shots, and Bayesian Analysis will combine this information to create a Posterior distribution.

Goaltending is a binary job (save/goal) so we can use a beta distribution to create a distribution of the goaltenders expected (prior) and actual (evidence) save percentage between 0 and 1, like a baseball players batting average will fall between 0 and 1. We also have to set the strength of the prior – how robust the prior is to the new evidence coming in (the shots and saves of the goalie in question). A weak prior would concede to evidence quickly, a hot streak to start a season or career may lead the model to think this goalie may be a Hart candidate or future Hall-of-Famer! A strong prior would assume every goalie is average and require prolonged over or under achieving to convince the model otherwise. Possibly fair, but not revealing any useful information until it has been common knowledge for a while.

More research is required, but I have set the default prior strength of equivalent to 1,000 shots. Teams give up about 2,500 shots a season, so a 1A/1B type goalie would exceed this threshold in most seasons. In my goalie compare app, the prior can be adjusted up or down as a matter of taste or curiosity. Research topics would investigate what prior shot count minimizes season to season performance variability.

Every time a reported result actives your small sample size spidey senses, remember Bayesian analysis is thoroughly unimpressed, dutifully collecting evidence, once shot at a time.

Conclusion

Perfect is often the enemy of the good. Expected goal models fail to completely capture the complex networks and inputs that create goals, but they do improve on current results-based metrics such as shot attempts by a considerable amount.  Their outputs can be conceptualized by fans and players alike, everybody understands a breakaway has a better chance of being a goal than a point shot.

The math behind the model is less accessible, but people, particularly the young, are becoming more comfortable with prediction algorithms in their daily life, from Spotify generating playlists to Amazon recommender systems. Coaches, players, and fans on some level understand not all grade A chances will result in a goal. So while out-chancing the other team in the short term is no guarantee of victory, doing it over the long term is a recipe for success. Removing some the noise that goals contain and the conceptual flaws of raw shot attempts helps the smooth short-term disconnect between performance and results.

My current case study using expected goals is to measure goaltending performance since it’s the simplest position – we don’t need to try to split credit between linemates. Looking at xGA – GA per shot captures more goalie specific skill than save percentage and lends itself to outlining the uncertainty those results contain. Expected goals also allow us to create an informed prior that can be used in a Bayesian hierarchical model. This can quantify the interaction between evidence, sample size, and uncertainty.

Further research topics include predicting goalie season performance using expected goals and posterior predictive distributions.

____________________________________________

[1]Without private data or comprehensive tracking data technology analysts are only able to observe outcomes of plays – most importantly goals and shots – but not really what created those results. A great analogy came from football (soccer) analyst Marek Kwiatkowski:

Almost the entire conceptual arsenal that we use today to describe and study football consists of on-the-ball event types, that is to say it maps directly to raw data. We speak of “tackles” and “aerial duels” and “big chances” without pausing to consider whether they are the appropriate unit of analysis. I believe that they are not. That is not to say that the events are not real; but they are merely side effects of a complex and fluid process that is football, and in isolation carry little information about its true nature. To focus on them then is to watch the train passing by looking at the sparks it sets off on the rails.

Armed with only ‘outcome data’ rather than comprehensive ‘inputs data’ analyst most models will be best served with a logistic regression. Logistic regression often bests complex models, often generalizing better than machine learning procedures. However, it will become important to lean on machine learning models as reliable ‘input’ data becomes available in order to capture the deep networks of effects that lead to goal creation and prevention. Right now we only capture snapshots, thus logistic regression should perform fine in most cases.

[2] Most people readily acknowledge some share of results in hockey are luck. Is the number closer to 60% (given the repeatable skill in my model is about 40%), or can it be reduced to 0% because my model is quite weak? The current model can be improved with more diligent feature generation and adding key features like pre-shot puck movement and some sort of traffic metric. This is interesting because traditionally logistic regression models see diminishing marginal returns from adding more variables, so while I am missing 2 big factors in predicting goals, the intra-seasonal correlation might only go from 40% to 50%. However, deep learning networks that can capture deeper interactions between variables might see an overweight benefit from these additional ‘input’ variables (possibly capturing deeper networks of effects), pushing the correlation and skill capture much higher. I have not attempted to predict goals using deep learning methods to date.

Goaltending—Game Theory, the Contrarian Position, and the Possibility of the Extreme

Preamble: The following is a paper I wrote while in college about 6 years ago. It is a slightly different approach and worse logic that I employ now, likely reflecting my attitude at the time – a collegiate goaltender with the illusion of control (hence goals were likely unpredictable events, else I would have stopped it). I have softened on this thinking, but still think the recommendation holds: goaltenders can outperform the average by mixing strategies and adding an element of unpredictability to their game.

How goaltender strategy and understanding randomness in hockey can lend insight into the success of truly elite goaltenders.

Introduction

This paper outlines general strategies and philosophies behind goaltending, focusing on what makes great goaltenders great. Philosophy and goaltending make interesting partners—few athletic positions are continuously branded with a ‘style.’ Since such subjective labels are the norm for this position, then I feel quite comfortable using the terms rather broadly in a philosophical analysis. I will use loose generalisations to formulate a big-picture view of the position—how it has evolved, the type of goaltender that has consistently risen above their peers during this evolution, and why. Using game theory and attempting to clearly label player strategies is, at times, clumsy. Addressing the impact of unquantifiable randomness in hockey does not provide much comfort either. However, the purpose is to encourage further thought on the subject, and not provide a numerical, concise answer. It is a question that deserves more thought, at both the professional (evaluation and scouting) and grass-root (development and training) level. The question: what makes a consistently great goaltender?

Game Theory—The Evolution of Goaltending Strategy

Passive ‘blocking’ tactics have become prevalent among goaltenders at all levels. It is simple, statistically successful, and passive. There are tradeoffs like any strategy—the goaltender forfeits aggressiveness in order to force the shooter to make perfect shots to beat them. This ‘fated’ strategy exposes the goaltender to the extreme—most goals allowed are classified as ‘great plays’ or ‘lucky,’ certainly not the fault of the goaltender. However, there are other considerations. Shooters, no doubt, have adjusted their strategy based on this approach, further compromising the passive approach to goaltending. This means a disproportionate number of shooters will look to make ‘perfect’ shots—high and tight to the post against a blocking goaltender—despite the risk of missing the net entirely.

Historically, goaltenders did not have the luxury of light, protective equipment that is designed specifically to seal off any holes while in a butterfly position. Equipment lacking proper protection and effectiveness required goaltenders to spend the majority of the time on their feet while facing shots.

Player/Goaltender Interactions Then and Now

Game theory applications allow a crude analysis of the evolution of strategies between players and goaltenders. The numbers I use are arbitrary, however, they demonstrate an important strategic shift in goaltending tactics. First, let us assume that players have to decide whether to shoot high or low and always try to shoot for the posts. Simultaneously, goaltenders must choose to block or react.

In the age of primitive equipment, goaltenders were required to stand-up most of the time to make saves. From here we can make three assumptions in this ‘game’ or ‘shot’: 1) While blocking, the goaltender’s expected success rate was the same if the shooter shot high or low. Since the ‘blocking’ tactic was simply standing up and challenging excessively when possible, it would not matter if the player shot high or low, the goaltender was simply covering the middle of the net. 2) While reacting, high shots were easier saves than low shots. Goaltenders generally stood-up, which make reach pucks with the hands easy and reaching pucks with the feet hard. 3) Goaltenders were still better reacting than blocking on low shots, since players will always shoot for the posts.

We can then use the iterated elimination of dominated strategies technique to find a dominant strategy for each player. In this scenario, goaltenders are always more successful, on average, reacting than blocking. Since goaltenders will always react, shooters acknowledge they are generally better off shooting low than high (while this is just a fabricated example, the fact goaltenders survived without helmets might prove this). Regardless, the point of this exercise demonstrates that goaltenders needed to have the ability to react to shots during this time. These strategies and the expected save percentages are displayed in the matrix below (Figure 1). Remember goaltenders want the highest save percentage strategy, while shooters want to find the lowest.

However, the game of hockey is not as simple as the pure simultaneous-move game we have set up. Offensive players are not shooting in a vacuum. They are often facing defensive pressure or limited to long distance shots, both circumstances limit the ability of offensive players to accurately shoot the puck. If the goaltender believes his team will be able to limit the frequency of high shots to less to 50%, then the goaltenders expected save percentage while blocking is greater than their expected save percentage while reacting.Advances in equipment then allowed the adoption of a new blocking tactic—the butterfly. By dropping to their knees and flaring out their legs, goaltenders were maximising their blocking surface area, particularly along the ice. Equipment was lighter, bigger, and increasingly conducive to the butterfly style, allowing goaltenders to perform at higher levels. Now the same simultaneous-move game described above began to increasingly favour the goaltender. Not only did the butterfly change the way goaltenders blocked, it changed the way they reacted. Goaltenders now tended to react from a butterfly base—dropping down to their knees at the onset of the shot and reacting as they dropped. The effectiveness of the down game now meant shooters were always better off shooting high. In a pure game theory sense, this would suggest players would always shoot high, so goaltenders should still always react. These strategies and the new payoffs are displayed in Figure 2.

This suggests that goaltenders with a good defence, good blocking technique, and modern goaltending equipment are better off blocking. When a goaltender is said to be ‘playing the percentages,’ this suggests the goaltender routinely blocks the majority of the net and forces the shooter to make a perfect shot. This strategy has raised the average performance of goaltenders. However, in a zero-sum game such as hockey, simply maintaining a level of adequate performance will not increase the goaltender’s absolute success, measured in wins and losses. The only way for a goaltender to positively impact their team is to exceed the average, which—as we will see—can be accomplished by defying the norm.

In conclusion, these strategic interactions did not create hard rules for goaltenders or shooters. However, the permeation of advanced tactics has heavily skewed the payoffs toward the goaltender. Goaltenders block more, and shooters shoot high as much as possible. An unspoken equilibrium has been created and maintained at all levels of hockey—thus altering the instinctive strategies employed by both groups.

The ‘Average’ Position

Goaltenders could now simplify their approach to their position, while simultaneously out-performing their historical predecessors. The average NHL save percentage rose from 87.6% in 1982 to 91.6% in 2011.* This rise in success rate would give any goaltender little incentive to break the norm. Imagine an ‘average’ goaltender, posting a save percentage equivalent to the NHL average save percentage each year. The ‘average’ goaltender would put up better numbers each successive year. While they would be perceived to be more valuable—higher personal statistics means a bigger contract, more starts, and a greater reputation—it is entirely conceivable that, despite their statistical improvement, they would not contribute to any more victories. If the goaltender at the other end of the ice is performing just as well as you (on average, of course) then the ‘average’ goaltender will not contribute any extra wins to his team compared to the year before. However, this effect would be difficult to observe over the course of a goaltenders career, and coaches and managers would become enamoured with ‘average’ goaltending, comparing it favourably to the recent past. The ‘success of mediocrity’ encouraged a simplified, safe, and ‘high-percentage’ approach to the position. If you looked like other goaltenders, played like other goaltenders, and performed like other goaltenders, there was little reason to worry about job security. In short, through the evolution of goaltending, goaltenders generally have had very little to gain from breaking the idyllic norm of how a goaltender should look or play like. The implicit equilibrium between shooters and goaltenders has persisted across different eras—most recently centring around a ‘big butterfly, blocking’ game, resulting in historically superior statistics for the ‘average’ goaltender.

The Limits of Success

There is no doubt that now the craft of goaltending is significantly superior to the efforts that preceded it. Goaltenders today are bigger, faster, more athletic, and advanced technically. However, the quest to fulfil the requirement of ‘average’ will be an empty pursuit in absolute terms (wins and losses) to any goaltender. In order to avoid becoming ‘average’ the goaltender must deviate from the strategic equilibrium that primarily consists of large goaltenders simply ‘playing the percentages.’ While goaltenders can exceed the average by simply being even bigger, faster, and more athletic than their peers, this is becoming increasingly difficult. Not only will teams continue to draft goalies for these attributes, there are natural limits to how tall, fast, and coordinated a human being can be. Shooters will also continue to adjust. An extra 2” in height does not necessarily prevent a perfectly placed shot over or under the glove. Recall the over simplified instantaneous move game: shooters will always be better off shooting high and to the posts—when they have time. High-level shooters have evolved to target very specific areas of the net, preying on the predictability of the modern butterfly goalie. However, the shooter will not always have time to attempt the perfect shot, which means the goaltender can revert back to primarily blocking and mediocrity without being exposed.

The Contrarian Position

While the goaltender cannot change his physiology in order to exceed the average, they can (slowly) alter their approach to the game. Remember, the strategic interaction between the goaltender and shooter has become predictable. The goaltender will fill up as much net as possible, forcing the shooter to manufacture a perfect shot, while the shooter will attempt to comply.  If a goaltender were to begin to mix strategies effectively and react some percentage of the time, they would be better off. The shooter has been trained to shoot high (that is their dominant strategy), and goaltenders are better off reacting to high shots than blocking and leaving their arms pinned to their sides. Essentially, by mixing strategies when it is wise, (when the simple block-react instantaneous move model applies) the goaltender can increase their expected save percentage—and exceed the average.

To demonstrate this point we must move away from the abstract and the general, focusing on specific examples. A disproportionate amount of statistical success throughout the ‘butterfly’ era has been the work of unorthodox goaltenders. While an ‘unorthodox’ style has had a negative connotation in the conventional world of goaltending, it is the defectors that have broken through the limits reached by the big, butterfly goaltender. Sub-six-foot Tim Thomas recently broke the modern NHL save percentage record by willing himself to saves and largely defying the established goaltending practice. The save percentage record previously belonged to Dominik Hasek. Like Thomas, Hasek was less than six feet tall and would consistently move toward the puck like no other goaltender in the game. To shooters that have very clear, habitual objectives (shoot high glove or low blocker just over the pad or through his legs if he is sliding, etc.) facing these contrarians led to a historically low shooter success rate. These athletes effectively mixed their strategies between blocking and reacting (their own versions of these strategies, mind you) to keep shooters guessing. Their contrarian approach has been remarkably sustainable as well—Hasek and Thomas have combined to win 8 out of the last 17 Vezina Trophies, despite their NHL careers only overlapping 3 years. By moving further away further the archetypical goaltender, both Thomas and Hasek exceeded the average considerably. It is exceeding the average that causes goaltenders to contribute to victories, the absolute measurement of success for any goaltender.

Consider the correlation between a unique approach and sustained success when accessing the careers of four Calder Trophy winning goaltenders: Ed Belfour, Martin Brodeur, Andrew Raycroft, and Steve Mason. Each began their NHL career in impressive fashion; however, two went on to become generational goaltenders, while the other two will struggle to equal their initial success. This may seem like an unfair comparison, but it is important to understand why it unfair. Both Brodeur and Belfour maintained an elite level of play because they generally defied convention throughout their career. Both played unique styles and were excellent puck handlers. When Belfour entered the league at the very start of the 1990’s his combination of athleticism, intensity, and an advanced understanding of positional play made him formidable. He mastered the butterfly before it was the standard—you could argue the success of Patrick Roy and Belfour helped create the current generation of ‘big, butterfly’ goaltenders. Brodeur has always been different—there has been no comparable goaltender to him throughout his career, just like Thomas or Hasek. He has been the most consistent and celebrated goaltender in NHL history without utilising the most common save tactic employed by his peers—he rarely drops into a true butterfly. Counter-intuitively, despite lacking a standard, universal save movement, he has also been remarkably consistent. Martin Brodeur has mixed his save selection strategies magnificently, preying on shooter programmed to shoot against predictable butterfly practitioners.

Now consider the other rookie standouts: Raycroft and Mason. It is difficult to distinguish their approach to the game from the approach of other ‘average’ professions. Mason is taller than average and catches right, but he does not present a unique challenge to shooters. They are goaltenders with an average, ‘percentage-based’ approach to goaltending. There is nothing note-worthy about the way they play the position. Why the initial success? Both goaltenders likely overachieved (positive deviation from the average) due to a favourable situation and the vague element of surprise. Shooters would soon adjust to the subtleties in the young goaltender’s game.* Personal weaknesses would become exploited and their performance regressed towards the mean. Their rookie years could have been duplicated by a number of other rookie goaltenders, with similar skill and luck. Their ‘average’ size, skill set, and approach to the game have manifested itself in an ‘average’ NHL career. An impressive beginning was nothing more than favourable luck and circumstance—their careers diverged significantly from other Calder-winning goaltenders. Goaltenders that went throughout their career masterfully mixing save selection strategies, by contrast, set the standard for consistency, longevity, and performance.

In conclusion, the modern equilibrium between goaltenders and shooters has been successfully disrupted by the contrarians like Dominik Hasek, Tim Thomas, and Martin Brodeur. The rest have enjoyed the benefits of the ‘big, butterfly goaltender’ doctrine—stopping more pucks on average—but have gained little ground on other ‘average’ goaltenders. These goaltenders are playing a strategy that contributes little to their team because they are more susceptible against the extreme.

The Possibility of the Extreme—The Black Swan Save

If contrarians exceed the average, it is important to understand how they can do it with remarkable consistency. I believe their unconventional style and willingness to react to shots leaves them better prepared to handle the possibility of the statistically unique shot—which I will call a ‘Black Swan’ opportunity.§ They can always use the butterfly tactic in situations that call for it, while the butterfly-reliant goaltenders struggle to improvise like contrarians. The ‘reaction’ strategy leaves them free to make the unconventional saves necessary to prevent Black Swans from becoming goals.

The position relies on instinct and split second decisions. Reactions and responses to defined situations are drilled into goalies from an increasingly young age. Long before these goaltenders are capable of playing in the NHL, they have generally mastered technical responses to certain, finite situations. Goaltenders may be trained very well to react predictably in trained circumstances, but this leaves the goaltender susceptible to the extreme—breeding mediocrity. In this case, the extreme or Black Swan shot, is the result of 10 position players on the ice, moving at speeds up to 30 miles per hour, chasing an object that can move close to 100 miles per hour. Despite the simple objective and the definitive results of the goaltending position, every shot against them has the potential to create an infinite amount of complexities and permutations. A one-dimensional approach—where the goaltender determines they are better off ‘playing the percentages’—to the position offers the goaltender the opportunity to make a large number of saves, but it does not prepare the goaltender to react favourably to a Black Swan. The problem, then, is not maintaining a predictable level of performance—making the saves ‘you should make’—it is the ability to adjust to the unpredictable and the extreme in order to make a critical save. This is accomplished by reacting to shots a healthy percent of the time.

The real objective of the goaltender is to give up fewer goals than the opposing goaltender. In a low scoring game such as hockey, it is likely one goal against will determine the outcome of any given game. Passively leaving the outcome up to chance is a mistake in my opinion. Aggressiveness and assertiveness are competitive qualities that are compromised by a predominantly butterfly style. By dropping in the butterfly the goaltender is surrendering to whatever unlikely or unlucky shot that may occur. A great play, a seeing-eye shot, or unlikely bounce—the ‘unlikely, undrilled’ occurrences that have the potential to win or lose games—happen randomly. The goaltender must be aggressive and decisive in order to adjust to these situations. These are the shots that cannot be replicated in repetitive drills; they require the creativity and instinctual reaction of an instinctual contrarian.

Goaltending—A Lesson in Randomness

The frequency of the Black Swan shot or goal against is erratic. They can happen at any time. There is little correlation between shots against and goals against on a game-by-game basis. If we assume the amount of Black Swan’s a goaltender faces is roughly proportional to the number of goals given up*— generally the more improbable shots faced, the more goals against—we counter-intuitively observe that the ‘Black Swans’ and the goals they caused occur randomly in a hockey game, largely independent of the number of shots against the goaltender. Taking the 10 busiest goaltenders of the 2010-2011 season, we see that their save percentage generally goes up as they receive more shots against. It does not matter whether the team gives up 20 shots or 40 shots, the random Black Swan occurrences that result in goals will happen just as frequency, regardless of the shots against. In outings where those goaltenders faced more than 40 shots, the average save percentage and shots against were 94.63% and 43.51, respectively. This implies these goaltenders gave up, on average, 2.33 goals per game when facing more than 40 shots. When these same goaltenders faced less than 20 shots, their save percentage was a paltry 82.17% on an average of 14.85 shots. This implies 2.64 goals against per outing where the goaltender faced less than 20 shots.§ Counter-intuitively they fared worse while facing less than half of the shots.

The frequency of the ‘Black Swan’ occurrences that led to goals appears to be largely independent of shots on goal. ‘Playing the percentages’ leaves every goaltender hopelessly exposed to random chance throughout the game. Goaltenders in the world’s best league do no better in absolute terms when they face 20 shots than 40 shots. They are the same goaltenders, they just fall victim to circumstance and luck.

Simply ‘playing the percentages,’ with an emphasis on blocking from the butterfly, leaves the goaltenders fate up to pure chance. No goaltender can attempt to consistently out-perform their peers by playing the percentages—at least, not with certainty. Hoping to block 90% of the net while relying on your team to limit quality opportunities will result in mediocrity. The Black Swan events that lead to goals occur randomly and just as frequently facing 15 shots as 50 shots. This has manifested itself in ‘average’ goaltenders’ performances fluctuating unpredictably from game to game and from season to season. In a game where random luck is prevalent, employing a strategy that struggles to adjust to the complexities of a game as dynamic as hockey will result in erratic and unexplainable outcomes.

The Challenge to the Contrarian

This creates a counter-intuitive result: the prototypical, ‘by the book’ goaltender will likely be subjected to greater fluctuations in performance, despite having the technical mastery of the position that suggests a level of control. Instead, it is the contrarian, with no attachment to the ‘proper’ way to make the save that will achieve more consistent results. The improvisational nature of a Tim Thomas stick save may appear out of control, but his approach to the game will yield more consistent results. The aggressiveness and assertiveness will allow the contrarian to make saves when there is no technical road map to reach the proper position on a Black Swan shot. Consider the attributes necessary the make an incredible save. Physical attributes vary among NHL goaltenders, but not by much. Height, agility, reflexes, and other critical skills for any professional goaltender will cluster around a certain standard. On the other hand, the mental approach to the game can vary between goaltenders by magnitudes. Goaltenders can become robust against the effects of Black Swans by having the creativity to reach pucks ‘technicians’ could not and having the courage to abandon the perceived safety of the butterfly. Decreasing the effects of Black Swan’s would be huge, and there are no theoretical limitations (unlike physical limitations) that exist. In a game containing the possibility of the extreme, it is the contrarian goaltender that will best be able to prevent goals against.

Leaving the safety of the ‘butterfly style’ can be dangerous for a goaltender. Coaches, managers, analysts, and peers will be quick to realise when a goal could have been stopped by a goaltender passively waiting in their butterfly. These ‘evaluators’ and ‘experts’ have subscribed to the ‘average’ goaltender paradigm for over a decade. After game 5 of the 2011 Stanley Cup Final, Roberto Luongo suggested that the only goal of the game against Tim Thomas would have been “an easy save for (him).” Proactively mixing save strategies does leave the contrarian potentially exposed to the unconventional goal against. Improbable, unconventional saves are great, but coaches and managers really only care about goals against. They can handle them if it was not the fault of the goalie—the perfect shot or improbable bounce that prey’s upon the passive butterfly goaltender. Just don’t pass up the opportunity to make an easy save and get scored on, contend the experts (luckily, Thomas was able to put together the greatest season of any goaltender in the modern game, he got a pass). Playing the game like freed from the ‘butterfly-first’ doctrine is a leap of faith, but it gives the goaltender the opportunity to contribute something positive to their team: wins.

Consider the great Martin Brodeur—the winningest goaltender in NHL history has often been discredited for playing behind strong defensive clubs while winning games and championships. However, random Black Swan chances have little regard for the number of shots against, as we have seen.  So why does Martin Brodeur have the most victories of any goaltender in NHL history? I would give a large amount of credit to his ability to make the ‘key save’ on the unlikely chance against. These saves would not necessarily manifest themselves noticeably at the end of the game or in any statistically significant way—rather they are randomly distributed throughout the game, like Black Swan’s are. Remember that, while New Jersey has been traditionally strong defensively, they have averaged 16th in the league in scoring during Brodeur’s tenure. With this inconsistent (and at times lethargic) goal support, Brodeur’s win totals remained remarkably consistent. During his prime he recorded at least 37 victories in 11 consecutive seasons. The low scoring years required extreme focus and competency. Where the game could hinge on one great play or bad bounce, Brodeur preserved victory more than any contemporary by being vigilant against the Black Swan chances. You can make the argument the low shot totals (and the subsequent merely ‘good’ save percentage) led to him being overrated considering his absolute success. However, Black Swan’s are somewhat independent of shots against, and until his detractors understand how three ‘Brodeur-only saves’ were the difference in a 3-2 win in a game where New Jersey gave up only 23 shots, the winningest goaltender of all-time will continue to be regrettably underrated, except for where it counts. No statistical analysis can measure the increased importance of a save to preserve victory compared to a save without that pressure.

Conclusion

I felt it was important to actively think about the strategies that have permeated the goaltending position and the impact it has had on goaltending performance. It was also important to liberate my thinking from too much quantitative analysis, rather focusing on the qualitative relationships between goaltender strategy, the random nature of the position, the goaltenders that consistently exceed the norm, and the goaltenders that will always be products of circumstance. None of this could be done with traditional goaltender metrics, they do not begin the even consider the possibility of the Black Swan opportunity against. Traditional statistics can be manipulated to underrate the winningest goaltender of all-time. Winning is sport’s sole objective, the goaltender always has some influence on winning, so goaltender wins are important. Traditional statistics lead to complacency with ‘average’ goaltending, which is goaltending that adds nothing to the bottom-line—winning. Leaving these statistical constraints behind can help clarify the connection between strategy and the contrarian, then between the contrarian and success.

Based on this philosophical analysis, I believe goaltenders should unsubscribe from the conventional goaltending handbook, aggressively mix their save selection, helpful remaining robust against the inevitable Black Swans opportunities against. This will allow them to exceed the ‘expected’ performance, and ultimately win more games.

____________________________________________

* A 4% increase in save percentage is significant; this is analogous to saying goaltenders gave up 48% more goals of the same number of shots in 1982 than 2011.

* While the butterfly style may be generic, each goaltender has relative strengths and weaknesses. NHL shooters will eventually expose these weaknesses unless the goaltenders can successfully vary their strategy (remain unpredictable).

In the ‘modern’ game-theory example, the goaltender would have to react the vast majority of the time to force the shooter to mix between shooting high or low (which is ideal for the goaltender). By doing so the goaltenders can exert their influence on the shooter, opposed to simply accepting that a great shot or lucky bounce will beat them.

• A term borrowed from Nassim Nicholas Talib and his book The Black Swan: The Impact of the Highly Improbable. Black Swan’s, named after the rare bird, represent the improbable and random occurrences in hockey and in life. Just because we cannot conceive a particular challenge nor have we prepared for it, does not mean it will not happen. ‘Black Swans’ are unpredictable, can have a large impact (a goal), and are the result of an ecosystem that is far too complex to predict (10 players, a puck, and physics create infinite possibilities). Events are weakly explained after the fact (you held your glove too high) but in reality the causes are much deeper and impossible to predict.

* While I would argue some goaltenders are better equipped to handle ‘Black Swan’ opportunities against them, these difficult, unforeseen events will still be approximately proportionate to the amount of goals they give up. NB: Tim Thomas is not included in this list.

This ‘extreme’ case happened 47 times out of the 677 games collectively played.

• Many of these games saw the goaltender pulled, so the goals against is ‘per appearance’ rather than ‘per game.’ While it may be argued that these goaltender just ‘didn’t have it’ these games, I would argue that more often they faced a cluster of bad luck and improbable chances against them. The total sample size is 60 games.

This attitude may explain the regression in Luongo’s game over the last couple of seasons. He once was a 6’3 goaltender with freakishly long limbs that would reach pucks in unconventional and spectacular ways. Now he views himself as pure positional goaltender that is better off on the goal line than aggressively attacking shots against him. Apparently it is better to look ‘good’ getting scored on multiple times than look ‘bad’ getting scored on once.

The standard deviation is 10 places, basically all over the place, both leading the in goals for and finishing last in goals for.