Having broken my silence with one blog, I’ve written another one already! In preparation for this, I created a data visualisation (viz) a few months ago with the idea of seeing how people in cricketing circles responded to it. The data set I used was First Class domestic matches played between 2012 and 2015 for the England squad and three other opening batsmen.
While this blog focuses on England cricketers again, it comes with a couple of different takes. One is who should partner Alistair Cook in the team? The other is squad/team selection, how to pick a balanced team that can win a lot of cricket matches. These were debating points in the media while I created the viz, but is slightly old hat now. I apologise, but I think that it still makes some valid points.
To cover some points regarding the viz: it only uses data from innings where a player scored 30 or more runs. As my main interest was with opening batsmen I felt this was a good idea, as opening batsmen often face the most difficult conditions in cricket – a new ball, fresh pitch, and fresh bowlers – and are therefore far more likely to lose their wicket early. To put everyone onto a level playing field I felt 30+ runs was a good starting point for this analysis.
With all that covered, here’s the viz:
As you can see, there are four elements covered. Batting consistency, shown with box and whisker plots. The closer together the grey box, the more consistent a player is. Taller top whiskers indicate greater ability to score ‘big runs’, while any mark above the top whisker is deemed to be an outlier – i.e. a rare occurrence for that particular batsman. Batting style plots average boundary percentage against average runs scored, with four quadrants broadly describing a batsman’s style of play. The next component is batting tempo, with average balls faced plotted against average runs scored – again four quadrants give a broad description of the tempo a batsman plays at. Finally, there is an innings forecast for the four players thought to be in the running to open the batting with Alistair Cook in the series against Pakistan (spoiler – it was Alex Hales).
So here’s batting consistency. To elaborate on the basic points made during the introduction, Joe Root is – by a wide margin – the batsman most likely to score ‘big runs’ of the selected players, shown by the height of the whisker for his plot. While Jonny Bairstow is more consistent high scorer, with the highest starting point for the box of his plot. It’s interesting to note that Ben Stokes appears to be the most consistent player, although this is countered by the fact he rarely goes on to score ‘big runs’ – often scoring fewer than 70 runs (and don’t forget, these are innings of 30+ runs). What else does it say for the potential opening partners for Cook? Well, Zafar Ansari is a consistent scorer but doesn’t go on to score the sort of ‘big runs’ you would expect of someone playing the majority of their cricket at the batsman’s paradise that is the Oval. Mark Stoneman, on the other hand, out performs Ansari relatively comfortably especially as The Riverside is a tough ground to score runs on. Adam Lyth is even better than Mark Stoneman, when it comes to these measures, but is completely outdone by Alex Hales. Hales scores higher more consistently, and is more able to make ‘big runs’. By this measure, Alex Hales is the obvious choice to open with Cook.
With batting style average boundary percentage was plotted against average runs scored, and the quadrants identify – broadly speaking – what style of play a batsman has. This helps in understanding squad selection, with one or two players selected from each quadrant. There is a reason I don’t think Mark Stoneman will be selected for England, and it’s a similar gripe to the one eloquently put forward by Omar Chaudhuri about James Vince. As mentioned, the plot broadly describes a batsman’s style, but the players also generally fit into groups based on their batting position. The bottom right corner of the scatter plot should generally be allrounders (number 6/7 in the batting order), The top right should be more attacking batsman (number 4/5), top left should be the best players (number 3/4), and the bottom left should be the openers. You can see that Mark Stoneman is firmly in the quadrant for allrounders, and Zafar Ansari, unfortunately for him, is an outlier for a bad reason. He is far too slow at scoring to be included in a Test Match XI. Adam Lyth is a good player to select for balancing a squad as he accumulates his runs but does so at a similar boundary percentage as Cook. Then again so is Alex Hales, he’s different insomuch as he gives the batting line up a dynamism it lacks although perhaps opening the batting isn’t necessarily his best position. Both players have their plus points.
Tempo is similar to batting style, because generally those who score a lot of their runs in boundaries will face fewer deliveries for their runs. The main thing that stands out to me here is Jonny Bairstow. He is the only batsman in the England squad to be in the top left quadrant. He scores his runs quickly, and scores heavily. It’s easy to tell that he’s a vital part of the England team, despite some difficulties with the gloves. In the top right quadrant are players who score heavily and bat for a long time. It’s interesting to see Alex Hales in there, as it was often stated that he struggled to rotate the strike – this is some evidence that agrees with that assessment. The other difference is the slight separation between Adam Lyth and Gary Ballance. This can be attributed to their batting position. As Lyth is an opener you expect him to face more deliveries than Ballance. Ansari and Cook occupy the bottom left quadrant, and the obvious statement to make is that Ansari simply does not score quickly enough. As with the previous scatterplot, Alistair Cook is far closer to the other quadrants that some would give him credit for. He may bat more slowly, and accumulate more than England’s other players, but it isn’t by a lot. For Ben Stokes, there is little surprise that he is the player who spends the least amount of time at the crease, and that quadrant is where we find Mark Stoneman. The other players who could open the batting for England all face more than 130 deliveries on average (Hales – 130, Cook – 140, Lyth – 145, Ansari – 170, Stoneman – 100). The smallest difference is between Stoneman and Hales, at 30 deliveries per innings, with the largest difference coming between Stoneman and Ansari at 70 deliveries. Working on a simply theory that one tends to face 50% of the deliveries bowled in a partnership (yes, it could be a lot more or a lot less – but forgive me for simplifying), he would be in the middle for between 10 and 24 overs fewer than the other batsman. Yes, he would score quickly, but in all likelihood that wouldn’t be enough of a trade off – he would cost England valuable runs by not batting for long enough. All of which brings us to innings forecasting.
For the above graph I have plotted a line graph for Ansari, Hales, Lyth and Stoneman (the faint, jagged, line) and used a logarithmic trendline to ‘forecast’ how many runs a player would score if they kept facing an increasing amount of deliveries. I did so because logarithmic trendlines are curved, indicating a rate of change in data (in this case runs scored) before reaching a plateau. For the pleateau I am not suggesting there is a finite amount of runs one can score in an innings, but factoring in the mental and physical fatigue a batsman encounters during a prolonged innings suggests that incredibly long innings will have three distinct phases for an opening batsman – getting your eye in (generally slow scoring at the start of an innings), feeling ‘in’ (generally an optimal scoring rate and in the middle of an innings), getting tired and getting out (generally slower scoring, possibly fewer boundaries). These assumptions are made from personal experience as a player, professional experience as an analyst, and intuition from watching a lot of cricket. Unfortunately I don’t currently have the right kind of data set to assess this, although to do so would be incredibly interesting! So to summarise my use of logarithm trendlines – it’s not perfect, it doesn’t use a bespoke algorithm to generate a single number that will tell you what’s going to happen out in the middle, but it’s in the ballpark. Oh, and one last thing to reiterate. The data set I used is for innings of 30 or more runs – so the trendlines start at zero runs and immediately jump to 30 runs, so the first small section of the trendline is irrelevant.
With all of that out of the way, it would appear that Mark Stoneman scores his runs far more quickly early in his innings that Ansari, Lyth, and Hales. Before his scoring rate slows, and is much less steep than that of Lyth and Hales. Could it be that in his high scoring innings’ Mark Stoneman scores his runs more slowly? As we have seen in the previous graphs his average innings is played at a fast tempo, with a greater percentage of his runs being scored in boundaries, does this change in his higher scoring innings – if so, why? Again, having a ‘ball-by-ball’ dataset would help answer these questions but I don’t have one. This particular graph shows, for the umpteenth time, that Ansari simply does not score heavily enough despite occupying the crease for long periods of time. Interestingly, despite their difference in styles and tempo, Alex Hales and Adam Lyth have very similar innings forecasts, with Hales narrowly coming out on top. I can see exactly why England selected him for the series against Pakistan. I do feel sorry for Adam Lyth, however. If not for the emergence of Haseeb Hameed, of Lancashire, the position of opener for the tour of Bangladesh could easily have gone to Lyth after Alex Hales withdrew over security concerns. I hope to do a little analysis on Hameed as soon as I can. Until then, that’s your lot.