Saturday, May 5, 2012

Caveat Statto: The Limitations of the Statistical Approach

Premiership Stats Visuals: See Dynamic Dashboard

Defining Terms

In a dataset for a football match, certain actions are fixed, and others a matter of opinion; however, they are fixed by the referee, which is the first problem.  The whistle is blown for fouls, freekicks, corners and goalkicks are awarded, offside calls made, cards handed out – all these things happen, and are therefore included in the dataset.  They may, however, be wrong.  This has a knock-on effect on the dataset; for example, QPR’s disallowed goal against Bolton in March.  Where is that in the dataset?  It was definitely a shot; it was definitely on target.  However, it was definitely a goal, by usual objective standards – but not according to the referee-defined objectivity of the game.  In recording that action, the observer must not designate it ‘a goal’, as it wasn’t (although it was); that then feeds into the realm of opinion in its recording in the shots statistics.

In that realm, there is further confusion; the recording of shots and shots on target is ostensibly a record of fact, but that is not necessarily the case.  When Emile Heskey’s attempt against Manchester United sheared off for a throw-in, is that recorded as a shot? He clearly meant to shoot, but if the ball ends up way over there, it can cause confusion – would any forward (or sideways) pass in the attacking third then be a shot?  No, but only because they are not intended as such. Then we are reduced to considering the motives of players in performing particular actions, which is frowned upon in other circumstances, such as when considering whether or not player A is or is not that kind of player, or when in ‘but did he mean it?’ situations (eg Olivier Giroud – 25 secs in, Papiss Demba Cissé, Tim Howard) where it really doesn’t matter if he did or didn’t, it went in, and thus the result defines the previous action.

A further example of this difficulty is shown in the different records that may exist for the same match; in the Manchester Derby, there was much talk of Manchester United not mustering a shot on target – according to the dataset used in my analysis, they managed 4 shots, 2 of which were on target, and thus did not ‘do a Blackburn’ (where no shots are recorded in their match v Tottenham).  In this realm, therefore, there will also be variances between different datasets.

Between these two positions are other actions that definitely happen, but without the sanction of the referee; passes completed, tackles won, etc.  These simply need to be seen and recorded by the observer and included in the dataset.  This inclusion is however factual rather than entailing any particular judgment, which is connected to our second problem.

Which Metrics? Quantifying Quality

No single metric can define a match; it is of course getting the goal in the back of the net that counts, but with the other team attempting to do that too (unless they are Blackburn playing Tottenham), even goals scored is not sufficient.  ‘Points’ is the final definer of a result, of course, but is in itself a result of a combination of actions rather than an action in itself.

Some metrics, such as possession, pass completion rate, and assists are simultaneously lauded and derided as measures of quality.  The first two in particular are used to demonstrate the dominance of a team, which mostly works as the highest performers in these areas tend to be Barcelona; however there is no causation here (see next section).  When Swansea played Newcastle in April, they had 77% possession, and completed 835 passes to Newcastle’s 181. They also lost 2-0. Thus, high possession and pass completion rates are useful in terms of potential, but that potential still has to be realised.

The assist is a tricky beast – and here, the French refer to a ‘decisive pass’, which seems more useful , as otherwise Hazard’s rabona against PSG would probably not count as ‘an assist’ as it bounced off De Melo first, before Roux got to finish – as an assist could be a beautiful piece of individual skill to set up a tap-in, or just the last mug to touch the ball before the striker did all the work.  The same can be said of goals, of course, but as they are used primarily as a team-metric, and to define individual performance only as a subsidiary, this is less pronounced.

As statistical analysis becomes more prevalent in the footballing discourse, there is occasionally the feeling that analysts are searching for more esoteric metrics to distinguish them from the ball in the back of the net crowd.  This can make life difficult.  An example – shooting accuracy might be considered a good reflector of quality, but if we look at that metric alone (% of shots that are on target) a slight drawback emerges (Fig 1).   

Fig. 1 - Best Shooting Accuracy by Team
Alternatively, when Arsenal were shipping goals all over the place early on in the season, there was still an insistence that Wojciech Szczesny is a fine goalkeeper (and that David de Gea might not be).  Looking at the rankings for save rate over the season (% of shots on target against that do not result in a goal) is similarly surprising from that perspective (Fig. 2 - and Manchester United are at the top of this chart). 

Fig. 2 - Worst Save Rates by Team

If no single metric can stand alone in match analysis, a combination of metrics may be more useful.  However, none can define success.

Cause and Effect – Prophesying the Past

Win more corners and you’ll win more games, as, hopefully, the saying doesn’t go (Fig.3).  Statistical analysis can assume causality from a metric that is actually an effect (attack more, and a team is more likely to win corners – they are also more likely to win; both are results of attacking more, but also then used to define the level of attack, circular reference warning ahoy).  Analysis can be dependent on results, and the interpretation of the metrics in the dataset behind that result can therefore change to fit the narrative, eg, Barcelona won because they had more possession, Barcelona lost because they didn’t capitalise on their possession.   The second statement (guess which match) is more accurate, and also gives the lie to the causality assumed in the first.

Fig. 3 - Most Corners Won by Team by Match

It’s the ball in the back of the net that counts, basically. Preferably the other team’s net.


Statistical analysis can be a useful addition to match reporting, but to me is more important in perceiving trends over a season rather than explaining a particular result, still less forecasting a game to come.  There are dangers at each end of the scale – over-reliance on particular metrics and an assumption of causality can lead to inconsistency as conclusions differ between matches; trying to take everything into account can render analysis so un-incisive that it is useless (or ends up being a simple statement of shit we already knew – you have to take your chances; or, Manchester City shoot quite a lot, Stoke don’t - Fig. 4).

Fig. 4 - Highest / Lowest Shots by Team

There is also the tension between objective and subjective in assessing the quality of a game – castigating Chelsea for playing ‘anti-football’ when they just beat arguably the best (subj) team in the world, by doing what had to be done, or lauding Swansea for playing beautiful football when they got beaten by Newcastle’s more direct approach, are two sides of the same coin.  A complex combination of metrics may approach expressing quality of play, but there is still no number that can adequately describe Cissé’s goal against Chelsea or Ben Arfa’s runs through confused defences.  The beauty of the beautiful game is difficult to convey other than by the use of the word woof.

There is also luck, of both flavours, and numerous hypotheticals around that – if Suarez hadn’t been bullied by a tree as a child leading him to take revenge on woodwork wherever he see it, if Harry Redknapp wasn’t using a dartboard to determine where Bale is going to play, if Arsenal had had a functioning set of defenders throughout the season, well then, things would have been different.  But luck is a matter of chance.  And then there’s the refereeing – if there was goal-line technology...

Finally, connected to the causality issue above, there is the danger of assuming X therefore Y or relying on preconceptions – under-estimating the other team, setting up not to lose and then going a goal down, being happy in possession but failing to take chances.  At the end of the day, it’s the ball in the back of the net that counts – you still have to play better than the other team.

My name is PhilippaB, and I am a functioning statoholic.  But I am striving to be self-aware.