Tuesday, May 29, 2018

Data Visualisation in Football

Today I was reminded of an article I wrote some years ago for Spiel Magazine - which sadly seems to have folded. I reposted it, with their permission, on another blog, which also now seems to have folded. Time flies, etc etc. Anyway. The reminder was an article by David Rudin in Statsbomb called How to Find Footballing Beauty in the Age of Stats, which is well worth a read. 

The following is the text of my article, retrieved from the archive so possibly with some banjaxed links. which was originally published in Spiel in August 2014, in their Football and Visual Culture edition. And thanks to guest editor Musa Okwonga for commissioning me to write it.  


À chacun son gout, but there are rules. Perspective (little used in football, maybe); balance (ditto); focal points. Artists and architects work with the same principles observed by botanists and biologists. Patterns and symmetry form the norm - the exceptions are the outliers (or in fiction, outsiders) which provoke a sense of surprise, even shock, discomfort, and notice. There may only be five (or seven) plots in literature but there are three in football - win, lose or draw - throw in underdog status, upsets, extra time and penalties, and things get worryingly Aristotelian. A beginning, a middle and an end versus a game of two halves.

There are formation diagrams, from the pyramid to the W-M to the strings of numbers reeled off like new Fibonacci sequences, 4-4-2, 4-2-3-1, 3-5-2, a fundamental part of any match-centre, or #journo tweet before a match kicks off. Chalkboards, messier in their representation of the action when that first, perfectly balanced starting point actually starts to move around. Heatmaps, showing density of action spreading like fire or fog, colour-coded pixels or columns rising like a reboot of SimCity, and passing diagrams, forests of arrows urging you to find the needle in the haystack. During the World Cup, there were MatchStory’s beautifully simple moving treemaps of headline stats, and Infostrada’s elegant wave visualisations of each match. And of course there are less successful approaches, such as drawing triangles on a screenshot of a midfield, as if there would be a configuration of five players pinging the ball about that wouldn’t allow for that. Triangles. Everywhere there’s b----y triangles.

The human brain is designed to see patterns in things, to seek them out - this can mean confusing correlation and causation, or focussing on the number that feels right instead of the number that matters. So, what is the Golden Ratio for football? First it was possession, assumed to mean victory, but - Ajaccio 2, Lyon 1 - can be sterile domination; then pass completion, problematic for an individual player in a single game given the need for context (position, direction, colleagues, opposition); shots per game v shots per goal, all of these things are steps along the way to finding the true pattern. Which can still be blown apart by the only number that matters - as a data-visualising geek with access to a powerful calculation engine, I might say IF (CONTAINS([Back of net],[Ball])) THEN COUNT_VALUES([Ball]) ENDIF.

Looking for patterns is a long-term endeavour, regression to the mean is commonplace - get too close to too small a dataset and you can be as misled as when you are one person in one seat holding one coloured piece of card and thinking ‘well, I feel a bit silly’, but then you see, later, that your one coloured piece of card was just one tile in a vast mosaic, one Seurat spot in a giant painting. Tifos. Along with flares, standing up, and ‘negotiation’, these elements of the game can feel exotic to viewers of certain leagues. Data as well - maybe unstructured, in the huge single centrepieces such as the one that Borussia Dortmund fans hoisted on the way to the 2013 Champions League Final, or highly structured, such as its backdrop or Raja Casablanca's tribunes for most of the season.

Because football is not just about what the players do on the pitch but how we, the observers, in the stands or not, react to it. We like balance - grumbling about players wearing the wrong numbers for their positions, reeling off those neat formations, appreciating midfield diamonds, wondering if Laurent Blanc playing two right-backs against Spain was a Picasso-esque rejection of symmetry (as well as a pre-kick-off admission of defeat) - and we like patterns. Oh, the joy to find a player with a name that will fit into a punning XI, whether mythological (Hydra Helguson, he’s a centaur forward), poetic (Oscar Wilde and Willian Wordsworth), artistic (Gustav Klimt Hill) - yes, Zlatan Ibrahimovic is one of the best players in the world but that’s an easier name to copyright than to pun with - here, as the European Football Show showed, Demba Ba is better. The words we use about football may often be ugly, but they can be made beautiful, as in the explosions of light on one of CartoDB’s maps of geotagged tweets as Real Madrid won la decima.

Poster series, sticker albums and #gotgotneed, wallcharts - all data; sets, percentages, organigrams. Memes, gifs, charts, shares - matches are datasets and players are variables (among others - location / day of week / meteorological conditions concatenated into something that speaks to every fan despite having little if any statistical basis). Colour-coding is not just a factor in those heatmaps or chalkboards but a personal identification, such that it can step in as an alias - Blaugrana, Nezzazuri, Albiceleste, les Bleus.

These patterns intrigue us, but not as much as the outlier, the outsider - the moment of magic, the goal from nothing. The words we use - beautiful, impossible, Pirlo - speak of an emotional reaction that outweighs all logic. Numbers can be beautiful but never let numbers be the only beauty.

It’s a beautiful game.