Analytics: The problem with goalkeeping stats

Bayern Munich goalkeeper Manuel Neuer. (Matthias Schrader/AP)

Goalkeeping is an enigma in the soccer analytics community that is usually under researched or avoided altogether.

The reason is that there are so many external variables that are directly related to the individual position’s performance to confidently evaluate.

As a goalkeeper one must complete many tasks in order to properly fulfill the role successfully, and most of them cannot be directly quantified and correlated to the player playing the position.

Besides saving a shot goalkeepers must: direct defensive shape, assign and reinforce defensive roles in set-piece situations, eliminate or limit rebounds after saves, distribute safely to maintain possession or start an offensive move, eliminate or limit corner kicks for the opposition through saves, and be an extra defender outside of the box to support a high defensive line, among other responsibilities.

MLS 2011-2014 teams starting GK's Save% x Team's Goals Against with width of each data point representing Team's end of year points tally. Data from American Soccer Analysis.

Traditionally a goalkeeper would be rated on their perceived most important task of saving the ball. One of the only statistics to directly understand this is save%, which is a metric made up of (Saves/Shots on Target Against)*100. But, as many analysts have looked into (Colin Trainor, 11tegen11, Paul Riley and Sam Gregory) this statistic is tricky, cannot encompass a goalkeeper's performance and does not correlate to qualifying the position season after season.

Goals are the real issue in quantifying most things in soccer. Because there are so few goals, these tallies become extremely valuable, and because of their value they tend to skew the idea of success within the sport.

A team that is playing well will win games
Although over a large sample size this should successfully ring true, it is a dangerous pitfall in analysis in general. What a manager and team wants to do is play well to increase their opportunities to achieve whatever outcome they desire (usually winning, but sometimes drawing a match).

If these desires are translated into successful play, then the results should follow. The issue, positionally, is the ability to break down contributions to the overall success of the team. This is no more difficult than in the situation of goalkeepers.

In the attempts to understand goalkeepers the focus has consistently been on the act of saving, whether it be through straight save% or adjustments made based on the quality of shot that is being saved, or not saved.

“Goalkeeping is multifaceted, and we have to measure other aspects of play to see how persistent they are and how much they contribute to results,” Daniel Altman of North Yard Analytics.

Although this fluctuates from goalkeeper to goalkeeper, the repetitions in matches over a single season will not suffice in determining whether a player is superior to the surrounding competition. Instead, the players and coaching staff will have more evidence and “data” in training to make a decision based upon the ability of the goalkeeper to make a variety of saves that satisfy specific demands.

In order to understand a goalkeeper in a game-state, by collecting a full spectrum of information to form a more complete picture of the position, the external variables must be addressed.

These external factors must be organized according to positional priority, or what would be demanded of a goalkeeper. Depending on the team the list would be organized as follows:

• Organizing the defence
• Minimize rebounds of quality into vulnerable areas.
• Claim crosses within defined range.
• Distribution
• Minimize the conceding of corner kicks off of saves.
• Defensive contribution with feet.

As stated previously, this order can be reshuffled based on the priority of the team. For example, a team that incentivizes possession from deep lying positions and relies on the distribution of the goalkeeper to start advantageous, offensive moves might place “distribution” higher (see Pepe Reina at Napoli, or Hugo Lloris at Tottenham). Or, if a team plays a high, defensive line they might want a goalkeeper to be a defensive point of contact for the opposition if they were to play long, ranging passes (see Manuel Neuer at Bayern Munich).

The concentration on measuring goalkeepers may have the first priority of shot seeded properly, but has completely missed the mark of understanding the position as a whole. Don't come knocking until we can figure out the external variables.

Coleman Larned is soccer analytics writer based in Antwerp, Belgium. Follow him on Twitter