Squad Selection in Cricket

Having broken my silence with one blog, I’ve written another one already! In preparation for this, I created a data visualisation (viz) a few months ago with the idea of seeing how people in cricketing circles responded to it. The data set I used was First Class domestic matches played between 2012 and 2015 for the England squad and three other opening batsmen.

While this blog focuses on England cricketers again, it comes with a couple of different takes. One is who should partner Alistair Cook in the team? The other is squad/team selection, how to pick a balanced team that can win a lot of cricket matches. These were debating points in the media while I created the viz, but is slightly old hat now. I apologise, but I think that it still makes some valid points.

To cover some points regarding the viz: it only uses data from innings where a player scored 30 or more runs. As my main interest was with opening batsmen I felt this was a good idea, as opening batsmen often face the most difficult conditions in cricket – a new ball, fresh pitch, and fresh bowlers – and are therefore far more likely to lose their wicket early. To put everyone onto a level playing field I felt 30+ runs was a good starting point for this analysis.

With all that covered, here’s the viz:

Dashboard 1.png

As you can see, there are four elements covered. Batting consistency, shown with box and whisker plots. The closer together the grey box, the more consistent a player is. Taller top whiskers indicate greater ability to score ‘big runs’, while any mark above the top whisker is deemed to be an outlier – i.e. a rare occurrence for that particular batsman. Batting style plots average boundary percentage against average runs scored, with four quadrants broadly describing a batsman’s style of play. The next component is batting tempo, with average balls faced plotted against average runs scored – again four quadrants give a broad description of the tempo a batsman plays at. Finally, there is an innings forecast for the four players thought to be in the running to open the batting with Alistair Cook in the series against Pakistan (spoiler – it was Alex Hales).

Batting Consistency.png

So here’s batting consistency. To elaborate on the basic points made during the introduction, Joe Root is – by a wide margin – the batsman most likely to score ‘big runs’ of the selected players, shown by the height of the whisker for his plot. While Jonny Bairstow is more consistent high scorer, with the highest starting point for the box of his plot. It’s interesting to note that Ben Stokes appears to be the most consistent player, although this is countered by the fact he rarely goes on to score ‘big runs’ – often scoring fewer than 70 runs (and don’t forget, these are innings of 30+ runs). What else does it say for the potential opening partners for Cook? Well, Zafar Ansari is a consistent scorer but doesn’t go on to score the sort of ‘big runs’ you would expect of someone playing the majority of their cricket at the batsman’s paradise that is the Oval. Mark Stoneman, on the other hand, out performs Ansari relatively comfortably especially as The Riverside is a tough ground to score runs on. Adam Lyth is even better than Mark Stoneman, when it comes to these measures, but is completely outdone by Alex Hales. Hales scores higher more consistently, and is more able to make ‘big runs’. By this measure, Alex Hales is the obvious choice to open with Cook.

Batting Style.png

With batting style average boundary percentage was plotted against average runs scored, and the quadrants identify – broadly speaking – what style of play a batsman has. This helps in understanding squad selection, with one or two players selected from each quadrant. There is a reason I don’t think Mark Stoneman will be selected for England, and it’s a similar gripe to the one eloquently put forward by Omar Chaudhuri about James Vince. As mentioned, the plot broadly describes a batsman’s style, but the players also generally fit into groups based on their batting position. The bottom right corner of the scatter plot should generally be allrounders (number 6/7 in the batting order), The top right should be more attacking batsman (number 4/5), top left should be the best players (number 3/4), and the bottom left should be the openers. You can see that Mark Stoneman is firmly in the quadrant for allrounders, and Zafar Ansari, unfortunately for him, is an outlier for a bad reason. He is far too slow at scoring to be included in a Test Match XI. Adam Lyth is a good player to select for balancing a squad as he accumulates his runs but does so at a similar boundary percentage as Cook. Then again so is Alex Hales, he’s different insomuch as he gives the batting line up a dynamism it lacks although perhaps opening the batting isn’t necessarily his best position. Both players have their plus points.


Tempo is similar to batting style, because generally those who score a lot of their runs in boundaries will face fewer deliveries for their runs. The main thing that stands out to me here is Jonny Bairstow. He is the only batsman in the England squad to be in the top left quadrant. He scores his runs quickly, and scores heavily. It’s easy to tell that he’s a vital part of the England team, despite some difficulties with the gloves. In the top right quadrant are players who score heavily and bat for a long time. It’s interesting to see Alex Hales in there, as it was often stated that he struggled to rotate the strike – this is some evidence that agrees with that assessment. The other difference is the slight separation between Adam Lyth and Gary Ballance. This can be attributed to their batting position. As Lyth is an opener you expect him to face more deliveries than Ballance. Ansari and Cook occupy the bottom left quadrant, and the obvious statement to make is that Ansari simply does not score quickly enough. As with the previous scatterplot, Alistair Cook is far closer to the other quadrants that some would give him credit for. He may bat more slowly, and accumulate more than England’s other players, but it isn’t by a lot. For Ben Stokes, there is little surprise that he is the player who spends the least amount of time at the crease, and that quadrant is where we find Mark Stoneman. The other players who could open the batting for England all face more than 130 deliveries on average (Hales – 130, Cook – 140, Lyth – 145, Ansari – 170, Stoneman – 100). The smallest difference is between Stoneman and Hales, at 30 deliveries per innings, with the largest difference coming between Stoneman and Ansari at 70 deliveries. Working on a simply theory that one tends to face 50% of the deliveries bowled in a partnership (yes, it could be a lot more or a lot less – but forgive me for simplifying), he would be in the middle for between 10 and 24 overs fewer than the other batsman. Yes, he would score quickly, but in all likelihood that wouldn’t be enough of a trade off – he would cost England valuable runs by not batting for long enough. All of which brings us to innings forecasting.

Innings Forecast (Openers).png

For the above graph I have plotted a line graph for Ansari, Hales, Lyth and Stoneman (the faint, jagged, line) and used a logarithmic trendline to ‘forecast’ how many runs a player would score if they kept facing an increasing amount of deliveries. I did so because logarithmic trendlines are curved, indicating a rate of change in data (in this case runs scored) before reaching a plateau. For the pleateau I am not suggesting there is a finite amount of runs one can score in an innings, but factoring in the mental and physical fatigue a batsman encounters during a prolonged innings suggests that incredibly long innings will have three distinct phases for an opening batsman – getting your eye in (generally slow scoring at the start of an innings), feeling ‘in’ (generally an optimal scoring rate and in the middle of an innings), getting tired and getting out (generally slower scoring, possibly fewer boundaries). These assumptions are made from personal experience as a player, professional experience as an analyst, and intuition from watching a lot of cricket. Unfortunately I don’t currently have the right kind of data set to assess this, although to do so would be incredibly interesting! So to summarise my use of logarithm trendlines – it’s not perfect, it doesn’t use a bespoke algorithm to generate a single number that will tell you what’s going to happen out in the middle, but it’s in the ballpark. Oh, and one last thing to reiterate. The data set I used is for innings of 30 or more runs – so the trendlines start at zero runs and immediately jump to 30 runs, so the first small section of the trendline is irrelevant.

With all of that out of the way, it would appear that Mark Stoneman scores his runs far more quickly early in his innings that Ansari, Lyth, and Hales. Before his scoring rate slows, and is much less steep than that of Lyth and Hales. Could it be that in his high scoring innings’ Mark Stoneman scores his runs more slowly? As we have seen in the previous graphs his average innings is played at a fast tempo, with a greater percentage of his runs being scored in boundaries, does this change in his higher scoring innings – if so, why? Again, having a ‘ball-by-ball’ dataset would help answer these questions but I don’t have one. This particular graph shows, for the umpteenth time, that Ansari simply does not score heavily enough despite occupying the crease for long periods of time. Interestingly, despite their difference in styles and tempo, Alex Hales and Adam Lyth have very similar innings forecasts, with Hales narrowly coming out on top. I can see exactly why England selected him for the series against Pakistan. I do feel sorry for Adam Lyth, however. If not for the emergence of Haseeb Hameed, of Lancashire, the position of opener for the tour of Bangladesh could easily have gone to Lyth after Alex Hales withdrew over security concerns. I hope to do a little analysis on Hameed as soon as I can. Until then, that’s your lot.


After A Brief Hiatus…

It’s been a long time since I was able to collect my thoughts and reflect on everything that’s happened since my last blog. In that time, I’ve started working for the England and Wales Cricket Board as a consultant Performance Analyst, working in elite youth cricket helping with squad selection and primary data collection for 14/15/16/17 year olds as they begin their careers in the game. It feels strange to admit that 14 year olds players are being analysed with the same level of scrutiny as cricketers with 14 years of professional experience, even if they don’t know it yet! But it made me think of some very salient points to make:

Firstly, why wouldn’t you analyse elite youth cricketers at a young age? To be able to identify key components of elite youth performance that translate to future elite senior performance is vital in talent identification. Not only does it allow for early identification of players, and thus earlier prescription of strength and conditioning/sport psychology/dietetics/sports medicine regimes etc, it’s intended to (hopefully) provide further knowledge about career progression and longevity. That isn’t to say the positives will happen, it just helps you increase your chances. And there is also a need to understand that career progression/skill acquisition/performance levels – call it what you will – isn’t a linear process either. There will be those that peak at 17/18 years of age, those that peak at 25/26 and those that peak at 29/30 years of age. The perfect example of this is Gareth Batty’s inclusion in the England squad for the tour of Bangladesh at the age of 38, having played a handful of test matches in his preceding years.

Secondly, not all young players relish being analysed and held account for their performances at a young age. Without naming names, there was one player who was playing county 2nd XI cricket (alongside professional players) who turned up to a regional age group tournament (the best 13 players from the North, London & the East, South & West, and the Midlands play each other in a round-robin format before the top two teams contest a final). I don’t need to say any more than ‘this lad can play’. However, in front of England coaches, selectors, and cameras he failed to live up to the hype. This isn’t unusual, and some Premier League footballers are accused of “going missing” when the TV cameras turn up or vice-versa. Others performed to a higher level than expect, while others wanted to know as much as possible, and after showing them how to review their own (and their peers) performance they come and bug you until a break in play to see how they performed.

Part of my remit was to report back to coaches about who had engaged, and who hadn’t. This is because the ECB wants to create self-sufficient athletes, athletes who are willing to engage in critical analysis of themselves, their teammates, and their opponents. On the field, you have to make decisions that will decide the outcome of the game. For batsmen, do you use your feet to this spinner, where is your boundary coming from this over, should I target this scoring area or that scoring area? For bowlers, how am I going to get the batsman out, which delivery am I setting you up for, where should I have my fielders, have climatic conditions changed enough for me to change my approach? All of these are conscious decisions in cricket, the time you have between deliveries to think about your approach is a vital aspect of the game and those who show themselves to be good critical thinkers capable of rationalising their decisions will be hugely valuable as their career progresses.

Thirdly, the world is doomed. Doomed I tell you. This is why: some players went out to bat and played outrageously good, incredibly inventive, shots will the sole purpose of filming themselves playing the shot on their phone from the analysis footage and uploading it to Instagram or some other social network. They genuinely changed how they played the game to make sure they had something to post online and show off with. We’re doomed.

Overall, consulting with the ECB has been great. While I had experience of working with professional cricket players before it was a whole new challenge when it came to working with young players. It’s something I am very proud of, and I hope to be involved for a long time to come. Hopefully some of the players I interacted with will go on to have long and fruitful domestic and international careers.

Screen Shot 2016-09-29 at 18.46.37.png

But life hasn’t stopped there. I have recently started working with the Gibraltar Football Association as Head of Performance Analysis. Gibraltar is a dependency of Great Britain but competes in UEFA and FIFA competitions in its own right. The opportunity came about when I was away with the ECB, and I returned from Loughborough late on a Saturday night and got a flight to Portugal on the Sunday morning. I hadn’t stepped foot in Gibraltar and I was Head of Performance Analysis…

*For context, Gibraltar play their fixtures at Estadio Algarve, in Portugal. The only football pitch in Gibraltar is used by every Gibraltarian school and football team. UEFA and FIFA have said the stadium is not up to international standards.*

I landed and we got straight into our work – the upcoming fixtures? Portugal in Porto (their first international fixture since winning the European Championships), and Gibraltar’s historic first ever World Cup Qualifier against Greece. In four days I had to analyse Portugal’s tendencies in and out of possession, their attacking set-pieces, and how they defend set-pieces. Not only this, I had to film and analyse morning and evening training sessions as well as prep for Greece.

In Porto I met the Swiss FA’s Opposition Analyst who introduced himself – Kevin was his name – and we had a chat about all things performance analysis, what I was expected to do with Gibraltar and what he did with Switzerland. When he realised Gibraltar had an analyst, and GPS units, he was utterly shocked. He’d come across national teams who didn’t have an analyst, didn’t have GPS units, and certainly weren’t approaching international football as professionally as Gibraltar. I had to admit that it is still in its early stages, and I’ll need to help the coaches and players understand where my work can help (and where it can’t), and why it’s important. On the whole, the players enjoyed being held accountable – the video sessions we had were received well and nobody had a problem with accepting responsibility for short comings. I was delighted they saw it that way (especially after my experiences with the young cricketers), and it was refreshing to see.

I can, however, safely say I worked a lot of hours every day, although the day after the Portugal game was a nightmare. The match finished 5-0 to Portugal and it took a long time to get back to the hotel, which was only 200m away (safety guidance and protocols impacted this), we returned to the hotel and had dinner. After dinner I sat in my hotel room and analysed the match in as much detail as possible, and prepared to report back to the manager on the short flight from Porto to Faro. At this point I hadn’t slept all night, and fatigue was setting in. Adhering to flight safety protocol, I put my laptop away for take-off and promptly grabbed a quick 10 minutes of sleep. I then woke up, and started talking to Jeff (the manager) about the game the night before using the video and analysis to frame my points. At times it wasn’t pretty viewing.

Screen Shot 2016-09-29 at 18.45.41.png

This ad-hoc meeting went well and we had a coherent plan for how the post-match meeting would run. I stepped off the aeroplane in the Algarve, greeted by 40-degree heat, and stepped on the bus for the trip back to the hotel – during this time I busily analysed more of Greece’s match footage. Once back at the hotel I dropped off some of the equipment in my room and moved into the lobby to continue working, stopped for the team lunch before finishing off my work in the afternoon. Rather than sleep, and risk being awake all night, I relaxed in the pool for an hour before eating dinner with the team and then having the team meeting. All told, I’d managed 10 minutes sleep in 37/38 hours and I slept incredibly well that night.

To those of you not in the Performance Analysis industry, this is not uncommon. There are much worse stories than mine, but I wanted to share it because it’s one of the reasons backroom staff’s have increased in size at the top end of the modern game – people aren’t willing to work crazy hours for low pay, so clubs arm themselves with opposition analysts, recruitment analysts, goalkeeping analysts, and match analysts (as well as analytics staff). When people complain about expanding staff numbers, and the old “there’s more staff than players” line gets trotted out, have some sympathy – it’s not as simple as it looks!

Basketball – It’s A Global Game

As a less-than-knowledgeable basketball fan from England (I started following the Denver Nuggets in the Anthony-Iverson era), I love reading the different articles out there about the game. They help people like me to understand the game, and give me reason to write things like this.

Now, anyone who has a passing interest in the NBA knows about the trend for teams to shoot more from beyond the 3-point line and that open looks for relatively poor shooters (say, about 30% career shooting) are more effective than contested shots from better shooters (say, 35+% career shooting). The most effective team with these new shooting trends is the Golden State Warriors – we all know this. So when I read a game plan for the Cleveland Cavaliers that included greatly improving their defensive rebounds if they were to stand any chance of beating the Warriors in the NBA Finals, it got me thinking.

Taking open shots over contested shots, and rebounding effectively on defence are fundamentals. If we extrapolate this further, it can cover other fundamentals of basketball. Ball movement becomes even more vital, coupled with good quality screens, to ensure that shooters can get open effectively. Defensively, the added defensive rebounds ensure that teams do not get second chance points.

If these are fundamental to playing successful basketball, does that translate from the glitz and glamour of the NBA to the less opulent game played in Britain? Answering the question is slightly difficult because there’s less data freely available; you’re stuck with generic box scores. Also, because it’s a bit heavy on the statistical testing this piece is a little ‘dry’.

After scraping data from 51 men’s basketball fixtures (from the BBL, Britain’s professional basketball league for men) and 51 women’s basketball fixtures (from the WBBL, Britain’s professional basketball league for women), I found the differences in frequency between winning and losing teams for every performance indicator. After that, I used SPSS to calculate the statistical differences between winning and losing for each performance indicator. You can see this plotted below.

N.B. The statistic for turnovers was inverted, so despite the graphic showing that winning teams commit more turnovers, in reality they actually committed fewer turnovers per game.

When that was completed I looked into the effect sizes for each performance indicator, which – when coupled with the information above – leaves us with the following performance indicators as the most important in the WBBL: field goal percentage, percentage of field goals missed, free throws made, defensive rebounds, total rebounds, assists, turnovers and field goals made. This suggests that committing fewer turnovers (remember, the statistic for turnovers was inverted), assisting more scorers – rather than having a player score in isolation, collecting more defensive rebounds – and thus preventing the opposition from scoring from a second chance opportunity, while field goal percentage, free throws made and field goals made suggest that scoring more baskets also increases your chances of winning – although, that is a particularly obvious observation. Something less obvious is that percentage of field goals missed has a greater effect size, and perhaps indicates that shooting less, but taking on more ‘open’ shots, is the best way in which to score in the WBBL.

For the BBL field goals missed, field goal percentage, percentage of field goals missed, opposition field goals made, 3 point percentage, defensive rebounds and assits were found to have moderate to strong effect sizes. Combining the information with the table above shows that in the BBL to be a winning team your field goal percentage and 3 point percentage are most important, as they relate directly to scoring, while you also rely on the opposition missing more field goal attempts which negatively impacts upon their percentage of field goals missed. Again, defensive rebounds are vital – this is because they prevent the opposition gaining consecutive possessions, and therefore prevent them from scoring. Assists are also within the moderate to strong effect size, which as suggested for the WBBL, means that to find an open shot as opposed to taking on shots in isolation is important to winning performance.

These are all things that I touched on before with regards to the NBA. While it’s nowhere near as in depth as the data from SportVU, the concept that what works in the NBA also works in Britain is of vital importance. How many coaches, of any sport, stress the need to master fundamental technique before moving on to more difficult skills? Almost all of them do, from my experience. So, the next time you have a young kid trying to run before they can walk just point out that even at the highest level the difference between winning and losing is the fundamentals.

Scott Borthwick – England’s Number Three?

With one Test Match left in the England vs Sri Lanka series there is a lot of pressure on Nick Compton to play a ‘career saving’ innings. This is not merely media speculation, Compton has admitted it publicly. The decision to drop Compton should be taken with the winter tours to Bangladesh and India in mind. When I devised game plans for Durham, I recommended we target his with spin early on – he doesn’t look comfortable against it, and can often look confused as to how he should play it. The three GIFs below are Compton getting out to spin:

If England decide to drop Compton for the upcoming series against Pakistan, who do they select? In the media there’s a suggestion that Scott Borthwick deserves a call-up, while Gary Ballance’s name was touted before the start of the Sri Lanka series. Tom Westley is another player in the frame, but the selectors tend to select only from Division One nowadays – unless, of course, you were first selected in Division One but since then your county was relegated to Division Two (i.e. Moeen Ali and Worcestershire).

So who deserves the call? It’s a difficult question to answer, mainly because Scott Borthwick used to be a number 8 who bowled and now is a front line batsman at number 3. It’s also difficult because Gary Ballance has batted at number 5 for the majority of his Yorkshire career, would his selection necessitate a change in batting order? To help us make a decision, here’s their average in the County Cricket for each player (Borthwick has been separated into top order and bottom order).


Averages, however, don’t tell the whole story. Ballance played a season or two in Division Two and his average may be inflated. Borthwick plays at Emirates Riverside, and is often thought of as a difficult place to bat – does this account for his lower batting average? Nick Compton, meanwhile, played a large chuck of cricket at Taunton which is well know for being a batsman friendly wicket. Instead, it may be worthwhile looking at their dismissals – generally speaking, good batsmen don’t often get bowled; they either hit the ball or get their pads in the way.

How Out

When Borthwick bats lower down the order he doesn’t get out LBW or Bowled very often, but when he moves up the order (and thus faces a newer ball) he gets out more in those ways. The above graph, however, is based on all innings. As Nick Compton has played the most matches of the three players, he has more dismissals full stop. If we convert those totals into a percentage for all dismissals, we get the following:

Dismissal Percentages

Nick Compton gets out bowled a lot more than both Borthwich and Ballance, which may suggest his techniques isn’t good enough for Test Match cricket. The over profiles of both Ballance and Borthwick is very similar, and we know Gary Ballance averages 47.78 in Test Match cricket – it’s a good sign for Scott Borthwick.

Packaging all of that information into a radar chat gives us the following image for Borthwick against Compton:

Screen Shot 2016-06-08 at 13.15.00

As already touched on, Nick Compton looks like by far the better batsman. The nagging thing for me is the difference in times he’s been bowled. That’s alarming, as far as I am concerned. If we take Gary Ballance and Scott Borthwick’s data and put it into a radar we get:

Screen Shot 2016-06-08 at 13.11.24.png

Again, the two are remarkably similar. This suggests their overall quality is very alike. So, why isn’t there a clamour to get Ballance back into the Test team? Well, for one he has a ‘unique’ technique. Ballance isn’t exactly easy on the eye, and appeared to have been ‘worked out’ the time he last played Test Cricket. He has some flaws, but has neglected to change those. As with all sports, you need to use your eyes to tell the story that stats don’t – players need to pass the eye test.

Scott Borthwick, on the other hand, doesn’t have many flaws. He scores his runs all round the wicket and doesn’t seem troubled by quick bowling. If you add in that he’s a capable spin bowler (don’t forget, he has one test cap as a front-line spinner!) and a seriously good fielder, then surely you’ve only got one option?

Borthwich has made great strides since he moved up the order, and he’s certainly got a fantastic record in County Cricket at the moment. While we know that having a terrific record in domestic cricket doesn’t always translate to a high quality international record, there’s certainly no harm in giving him a go. From personal experience, he’s a hard-working, dedicated professional who lives and breathes the game. If Nick Compton can’t salvage his career, England would do well to look North.


Yorkshire’s Gale Little More Than a Draught

Yorkshire’s record in the Natwest T20 Blast hasn’t been the best over the years. They’ve never won the competition and, despite having an incredibly talented squad stuffed full of England players and top level overseas players, have reached Finals Day once (2012) in the competitions 14 year history. In fact, they have only qualified from the group stage three times – 2006, 2007 and 2012.

To say they need to improve their T20 cricket is an understatement, and I’ve had a look at one of their senior players to see if I can help explain it. Andrew Gale, the 32-year-old Yorkshire Club Captain, has been the T20 Opener for a long time but was dropped by the team midway through the 2015 campaign and missed their first T20 of the 2016 season vs. Leicestershire.

This decision could be the first steps towards improve their T20 fortunes, and here’s why:

Screen Shot 2016-06-02 at 12.52.06

Andrew Gale ranked 130th in Strike rate for all players, but 38th in total runs. These two figures taken together show that while Gale can score runs in T20 cricket, he does so at such a slow rate that the rest of the Yorkshire team have to play so recklessly that they’re unable to put together a decent total. Compared to top-ranked T20 openers, his average runs per over is vastly lower – Brendon McCullum and Chris Gayle score at over 11.50 runs per over, while Gale scores at 6.57. Yorkshire’s win rate with Gale in the team is 33%

There’s a well-used phrase in cricket “You’ve got to get him early”. It’s often used for players who, when they’re “in” can put together big scores. For Andrew Gale, the opposite is true – I’d much rather keep him at the crease, than to get him out.

Andrew Gale T20 Balls vs SR.PNG

The graph above shows Gale’s balls faced along the horizontal axis, and his strike rate on the vertical axis. The data covers his entire T20 career with Yorkshire. You can see that I’ve got the axes set to 70 and 400, the reason will become clear very shortly. What you can see is that Gale rarely faces more than 25 deliveries, but in that time scores at roughly a strike rate of 120. Even when he does face more deliveries, his scoring rate is very low – both times he’s faced 50+ deliveries his strike rate hovers at about 100.

How does that compare to Chris Gayle, the self proclaimed Universe Boss? Well, not particularly favourably.

Chris Gayle T20 Balls vs SRte

While there are more data points available for Chris Gayle (after all, he plays T20 cricket all over the world), the sheer volume of points that exceed Andrew Gale’s data points suggest he is a far superior player – but that’s an obvious point to make, every single person who has watched the two bat could tell you that. Also, you can see that Gayle’s data points fit perfectly on the axes – his data was so extreme I had to use those dimensions for all of the players otherwise comparisons become rather difficult. So, rather than comparing Gale to one of the Kings of T20 I’ve also got some data for other English T20 openers. First up, here’s Jason Roy:

Jason Roy T20 Balls vs SR

You can see that Jason Roy doesn’t often face more than 40/45 deliveries, but regularly surpasses a strike rate of 100. In fact, he’s more likely to score at greater than 125. That’s a good record, but not exactly setting the world alight. How about his England T20 opening partner Alex Hales?

Alex Hales T20 Balls vs SR

What first strikes me about Alex Hales is how steep his trendline is as the beginning of his innings, and how closely his data points fit to it up to about 30 deliveries. Hales really is one of those players you need to get out early, or else he starts to motor along nicely. Like Roy, almost all of his innings’ have a strike rate of 125+, with a fair number of innings’ scored at 150+. Like with Gayle, we would assume that Jason Roy and Alex Hales are going to be a lot better than Gale in T20s. But what about a player who often gets overlooked by England, and probably should have played a lot more T20 cricket; what about Luke Wright?

Luke Wright T20 Balls vs SR.PNG

Again, there’s a very good fit to the trendline with Luke Wright, and he has by far the steepest curve. When commentators say “You’ve got to get him early” he’s who they’re thinking of. Get him in his first 10 deliveries and he’ll only have scored 10. Let him face 25 deliveries and he’ll score 37/38. Unlike Roy and Hales, Wright also has a track record of batting for a long time, facing more than 60 deliveries on 4 occasions, compared to once for the two England players.

What happens, though, when we plot all the players on the same graph? Well, the results aren’t quite what you would expect:

T20 Openers Website

Luke Wright appears to be by far the most explosive batsman, reaching a far higher strike rate than the other players once they’ve all faced at least 20 deliveries. Not only that, but look at Chris Gayle! He scores much more quickly at the beginning of his innings than Hales, Wright and Gale, but the longer his innings goes on the less explosive he appears to be. Now, that might be explained by the fact that when Gayle goes big, he quite often starts very slowly (I know, that’s against what you can see in the graphic, but hear me out!) I’ve heard him interviewed and say he knows he’s on for a big score when he takes his time early on, gets his eye in, and then goes big. My data can’t show this, because it’s only a plot for completed innings. If we could see how his strike rate changes across a big score, I’m sure we’d see a sharp increase in strike rate at somewhere between 15 and 20 deliveries. But that’s only a hunch.

For now, the take home message is this: Yorkshire need to keep Gale out of their T20 team this season. He hasn’t formally announced retirement from that format, but I would be shocked if he played much T20 this season. Well, that and you can expect some more interesting innings from Luke Wright!

Simon Mignolet – Liverpool’s Let Down?

Every time Liverpool were on TV last season, I’d spend the entire game wondering what possessed them to sign a goalkeeper like Simon Mignolet. There was nothing he did that made me think he would ever be a decent ‘keeper. It wasn’t just the fact he’s dreadful, it was that Liverpool let Jose Reina leave and thought bringing in Mignolet as a replacement was a good idea. Liverpool, and Jurgen Klopp’s, first signing of the new season is Loris Karius, the Mainz 05 ‘keeper – a move that indicates Klopp was just as unhappy with Mignolet as I was.

To try and show Mignolet’s lack of quality as a Premier League goalkeeper I’ve created some radar charts. Now, as I haven’t had the opportunity to collect any data myself this season, I’ve had to scrape some data from online sources. Yes, I know they’re unreliable. Yes, I know you’re all shaking your heads wondering why I’m doing this, but bear with me…

It’s about the process of making the radar charts, and how they can be used if/when I am able to produce some reliable data. There are a few issues with creating radar charts in Excel, you can’t change the scale for each ‘spoke’ and the image itself is a little bit ‘bland’. To get around these issues I firstly redrew the background, and imported it into Excel. After that, I found the average percentage the opposition had and calculated what the values would have been if the opposition had 100% possession. That way you can compare goalkeepers against each other, otherwise someone like Petr Cech would appear to be a ‘lesser player’ because he’s less busy than other goalkeepers. It’s an imperfect solution, but it’s better than nothing.

The only statistic to be unchanged was the saves per goal ratio, this is because that isn’t reliant on possession.

After that, to ensure that all the data were scaled in the right way I calculated the mean and standard deviation for each Performance Indicator, and then found three standard deviations above the mean. I then used that figure and found the percentage of it for each individual statistic. From that, I then inverted the ‘Goals Conceded’ data (basically, I subtracted the value from 100); this means that a higher the peak in the radars below is better than a lower peak. It sounds weird, I know, but it works!

Mignolet vs Lloris

Above is the radar chart for Mignolet plotted against Hugo Lloris. Note the extent to which Lloris makes more saves (and the position those saves came in), and how many more saves per goal he makes. You could also make a case for Lloris being more dominant in his area than Mignolet by the higher frequency of punches he makes.

Mignolet vs Schmeichel

Compared to Kasper Schmeichel, there’s a huge difference in the number of catches Schmeichel makes compared to Mignolet. Again, this suggests that Mignolet isn’t particularly dominant in the air. Schmeichel also makes a high proportion of his saves in the 6 Yard Box, which could infer that Schmeichel’s reactions are superior to Mignolet’s as saves in that area are likely to be from close range shots – although that’s purely speculation at this point.

Mignolet vs Mannone

Vito Mannone, Sunderland’s goalkeeper, is one of the best keepers in the league according to the statistics I’ve used. Despite adjusting the data for possession, he comes out on top against almost all of the ‘keepers in the Premier League for saves and saves per goal. He’s also dominant in his box, and comes out to catch the ball often. Compared to Mignolet, Mannone is a beast.

All of that helps show why I don’t rate Simon Mignolet, but it’s also pertinent to mention – again – the fact that the data used here is scraped from the Internet and cannot be controlled for reliability. Take it with a pinch of salt! But now we have to move on to Karius and how he compares to Mignolet.

Mignolet vs Karius.png

Karius, using the radar chart above, looks to be a better shot stopper than Mignolet. His numbers are all higher, and he has a much better shots per goal ratio than Mignolet. However, he has lower scores for catches and punches, which could suggest that he is even less dominant in his area than Mignolet – which is a worry. We all saw David de Gea’s first few seasons in the Premier League, when he struggled to cope with the physicality in the penalty area.

Foreign goalkeepers need the time to become accustomed to the Premier League, and it takes more than simply time in the gym to improve. Karius, at 22, has plenty of time to learn and improve and he could well be Liverpool’s number 1 for years to come – but don’t be surprised if teams put in crosses that force him into action and he ends up having similar struggles to de Gea.

Alan Pardew – One Season Wonder?

Alan Pardew shouldn’t get a new contract with Palace. But, contrary to the opinion of most West Ham, Charlton, Southampton, and Newcastle fans, he’s likely to get one. As a Newcastle United fan you might think I have an axe to grind with this latest piece, but I don’t. I just want to use a little bit of data to explain my thinking here.

He’s a one-season wonder at every club he’s ever managed, with the exception of Reading.

To make my point here I’ve created some “momentum charts” – now I don’t want to fool anyone, they aren’t a true measure of momentum. That sort of information is incredibly difficult to measure, and as far as I can tell the only company in the world that’s able to measure it accurately is Sports Wizard. Momentum charts are, however, used widely within Rugby. You add one point for a positive action and subtract one point for a negative action, or if the opposition performs a positive action.

With regards to football: a win is plus one, a draw is no change, and a loss is minus one. Without further ado, here’s Pardew’s momentum chart in the league (no cup competitions included) for every full season he’s been a manager:

Full Seasons Pardew

As far as you can tell, he’s created positive momentum in almost all of his full seasons. However, at least five of those positive seasons are from the Championship or League One. Very few positive seasons are from the Premier League.

If we take only Pardew’s Premier League seasons we get this:

Full Premier League Seasons.png


Alan Pardew has managed one positive season in his five full seasons as a Premier League manager – the year he won manager of the year and Newcastle finished fifth: 2011/12. I’ve disregarded seasons where he his West Ham 05/06 season, because he finished on +1, and that’s because it’s so close to zero. Somehow I don’t think West Ham fans would have been too happy with that result. Have we proved that Pardew is a one-season wonder yet? Not quite, but we’ll get there.

His record as Reading manager was quite positive, as you can see below. But don’t forget that these seasons were in the Championship or League One, with a relatively well-funded club compared to most in the division. There’s no denying that he did well as a manager for them.

Reading Pardew.png

After his successful stint at Reading, Pardew moved on to West Ham. He achieved promotion, but then was sacked when he had Carlos Tevez and Javier Mascherano in his West Ham squad. His two seasons in the Championship were his most positive when he had a wealthy club and was expected to achieve promotion. In the Premier League he managed a poor season in 05/06 (as already discussed) and then reached a momentum score of -7 before being sacked by West Ham. One good season in the Premier League.

Pardew West Ham.png

Having been sacked by West Ham, Pardew moved across London to Charlton Athletic. They were already in dire straights, and he was unable to save the club from relegation. His momentum score never got above +1, and Pardew wasn’t able to achieve the short term lift in performance a lot of new managers do; eventually finishing with a -2 momentum score.

In their first season in the Championship, Charlton were doing relatively well after 36 games with a +5 momentum score. As you can see, however, their form nosedived after that and he finished on +1 with a win on the last day of the season. This was Pardew’s only ‘decent’ year with Charlton. In the 08/09 season Pardew was sacked after fewer that 20 games, with a momentum score of -5. He hadn’t been able to turn around the clubs fortunes after their nosedive at the end of the previous season.

Pardew Charlton.png

After Charlton came Southampton; where he had the biggest budget and highest expectations in League One. Pardew’s time at Southampton is more difficult to assess as he only managed one season and three games. His 09/10 season with Southampton would have resulted in promotion has they not been deducted 10 points for entering administration, and was sacked after three games the following season. I’ll leave it up to you to decide whether or not he was treated harshly.


Now we’ve reached the period in Pardew’s managerial career that I was most invested in – his time at Newcastle United. He broke all kinds of records, worst league form in the clubs history, worst record vs. Sunderland of any manager in Newcastle’s history. I could go on, but I won’t.

There’s only one season with positive results. I’ve already mentioned it, the 11/12 season where Papiss Cisse was defying the laws of physics, Demba Ba was banging in the goals, Yohan Cabaye was dictating terms in midfield, and Hatem Ben Arfa was allowed to flourish. That team was a joy to watch, believe me. If you fast-forward to the following season, Pardew had his worst season at Newcastle finishing on a momentum score of -9. As a Newcastle fan, I can tell you with some conviction that we were shocking. He’d completely changed the style of play from all out attack to long balls to Cisse, who is definitely not a target man. Instead of letting Ben Arfa weave his way through defences scoring goals that only the likes of Lionel Messi were supposed to score, he insisted he should become a defensive winger. He never regained his confidence in England, but has terrorised French defences in Ligue 1 this season for Nice and was even linked with a transfer to Barcelona. What was Pardew thinking? Only he knows…

Eventually he left of his own accord, something every average football watcher thought was the best decision Pardew ever made. He’d get away from the “southern hating Geordies” and flourish. These people forget that we adore Les Ferdinand, Chris Hughton, Rob Lee, Warren Barton, etc. As a Newcastle fan it doesn’t matter where you’re from what matters is how the team performs on the pitch.

Pardews Newcastle.png

So it’s time to move on the Palace. He took over mid-season and kept Palace up in 14/15, and I would like to say that’s probably going to be his only good season with them. He finished on a score of +3 that season. It was a different story in 15/16, with Pardew’s team completing their usual nosedive in form. With a maximum momentum score of +3 achieved after nineteen games, Palace finished on -7.

Palace Pardw.png

If his managerial history is anything to go by, Palace fans are in for several relegation battles in the next few seasons. Apart from spending time with relatively well-financed lower league teams, Alan Pardew is incapable of created a team that will continually challenge for mid-table positions. His time at Reading and Southampton were his best as a manager, and those were in League One. Maybe that’s his level; maybe it’s the Championship – it’s definitely not the Premier League.

Using Stats in Football – It’s Not a One-Size Fits All Approach

Football is my passion, my chance once (sometimes twice) a week where I get to shout and scream at the TV or stand up in the stands with 52,000 other just like me getting carried away with what’s on the field in front of us. Nobody bats an eyelid; nobody looks at you thinking you’re a madman. The only trouble with that is my job, my qualifications, all say I am a Performance Analyst – one of those people “ruining football” if you read the newspapers or listen to phone-ins.

Now, I have never worked in professional football as an analyst – I’ve always thought it would be difficult for me to reconcile my love of being at a game with the need to analyse every action that I see, stopping me getting emotional about what’s happening on the pitch and thus losing the joy of it all. So this, for me, is a big thing. I’m about to write about stats in football, why they matter and why I spend a lot of my time irritated and annoyed about the bastardised form of analysis the public have to swallow.

Obviously there are people who write a lot about this kind of stuff on the Internet, and do it brilliantly I might add. But nevertheless I feel the need to add to what’s out there by doing more than simply saying, “this is what they do”. Hopefully, I’ve given you a few other thoughts along the way as well.

Everyone seems to be writing about Leicester City at the moment, and why not? They’ve just completed a terrifically successful season and become the first new champions of the English top flight since Nottingham Forest. But how and, most importantly, why did this happen?

You’ll no doubt have heard they’re the team with the lowest percentage of possession to win the league – it’s all anyone goes on about. That, and their use of the “out-dated” 4-4-2 system, coupled with the “pre-historic” long balls they keep playing. Well, so what? Where does it say in the rules of football that the game has to be played in one specific, aesthetically pleasing, way? For me, it comes down to this – pundits, journalists, or “experts”, call them what you will – people don’t know how to use the numbers properly.

Luckily, I follow a more ‘enlightened’ bunch of people of twitter, and revel in their withering put downs of the partially informed.

Teams like Barcelona (2008-12), Bayern Munich (2014/15), Spain (2008-12), Germany (2013-present), Arsenal (Wenger era) all keep the ball, looking to stretch defences to move them out of position and rely on their ball control and passing ability to win matches. Their numbers for possession, total passes, passes completed, key passes, and through balls etc. are astronomical.

It must be that their way is the best way, surely?

Not for me. Yes, they’re magnificent teams that play the sort of football you’d happily pay obscene prices to watch – but that doesn’t mean they can’t be beaten. Inter Milan (2009/10) and Chelsea (2011/12) both beat Barcelona and went on to win the Champions League. Athletico Madrid (2011-present) keep achieving greater and greater things with every season and are in this years European Showpiece. These aren’t lucky teams in the same way you could say Leicester have been lucky (although I’d rather point out that they’ve been magnificent in their adherence to a game plan). They aren’t “cheating” their way to the top, as Bayern’s Vidal claimed, after they lost to Athletico’s brilliant performance.

To defend as magnificently as Athletico is just as skilful as the passing and movement of Barcelona, it’s just a different skill. To deride them as cheats, as lesser players, is to suggest that defending isn’t as important as attacking. Just ask Newcastle’s entertainers (the team I grew up watching as a child) if all out attack is always the best way? If it was, we’d have won the league in ‘96 and we probably wouldn’t be in the mess we are now with a second relegation from the Premier League confirmed. We played breath-taking, entertaining, joyous football but didn’t win the league. Would I have taken a few more 1-0’s and a few less 4-3’s to make sure that happened? Of course I would.

The way I see it is that most people are looking at the game the wrong way. Although that’s not to say I’m right either, I just have an opinion. As a Newcastle United fan, I’ve used some examples from the thumping 5-1 win vs Spurs on the last day of the season. It’s the only high point of a dreadful term, so I had to use it, didn’t I?

The matter at hand: to my way of thinking teams are only relevant to themselves, and their data is merely a snapshot in time to that specific team. A good example to think of is this: can you truly compare players from the 1950s to players today when the game is so different? You can take it to a player level too: is it right to compare Lionel Messi to Jamie Vardy? Well, you can try… but its apples and oranges, just look at the differences in radar charts of different styles of players in the same position (in fact, just read the whole thing – it’s excellent). They have different roles; they play in different teams, in different leagues and are light-years apart in terms of on the ball skill. But their effectiveness? That’s where they are similar – they both contribute hugely to the success of their teams.

Inter Milan, Chelsea, Athletico, Leicester et al., they all have one thing in common – they zigged when everyone else zagged. But they weren’t reinventing the wheel. They maximised the quality in their squads, recruited players that fitted into their system, had players that shared common goals and bought into a different way of playing that exploited their strengths. They didn’t try to be Barcelona, Bayern Munich, Spain, Arsenal or Germany – they realised they couldn’t be, and beat them all anyway.

What that means is that they used the data at their disposal properly.

An example of this is the mythical “Zone 14” – the section of the pitch highlighted below. To the untrained eye, it’s where the most effective passes come from – generally speaking, if you pass the ball from there into the penalty area you’re more likely to score goals. Historically the numbers say that passes from there lead to the most goals, so it must be the best yes?

Zone 14

My thoughts on Zone 14 are this – it’s a great place to pass from, if circumstances allow, but is most effective when you have two terrific wide players that stretch the defence wider than they want to be. Nowadays with analysts at every club, people know it’s the best place to pass from and defend differently (a la Leicester and Athletico) which makes the area much more condensed and difficult to break through. In doing this though, you can often lose sight of the most vital question analysts have to face: “so, how do you beat them?”

Well, the answer is obvious isn’t it? If you aren’t Barcelona, Bayern et al., but you have two fantastic crossers of the ball (either overlapping full backs or dynamic wingers), then get the ball into the space out wide; bring it to the byline and cross it in! Think of Arsenal vs Leicester this season – Arsenal got the ball wide and into space, crossed it in and caused mayhem. Giroud’s disallowed goal, Walcott’s equaliser, Welbeck’s winner (albeit from a free-kick) were all from crosses in the wide areas with an incredibly densely defended penalty area.

Arsenal vs Leicester

This has the double benefit of bringing Zone 14 back into play – because you’ll have executed the wide plays well, the defence will try and prevent the wide players getting the ball. This means there’s greater space to pass the ball into from Zone 14. BUT it’s difficult to manage, almost all analysts will tell you how difficult it is to create scoring chances when the ball is out wide – I think this has a lot to do with the use of inverted wingers, and a decreased importance placed on coaching players to cross the ball effectively.

Most number crunchers will say that crossing the ball is a fool’s errand, because it doesn’t often create chances (the average conversion rate is 1.76%) and is easier to defend against. But, that’s because none of the data currently available tells you why some of the leagues best players (Ozil, Silva, et al.,) encourage crossing from the very edge of the penalty area (think of someone like Zabaleta overlapping at pace, and crossing the ball low into the area between the six yard box and penalty spot).

Marek Kwiatkowski has an opinion as to why – to both paraphrase and elaborate: better teams won’t swing in the incredibly low percentage crosses that most analyst (including me) hate; they instead have a preference for low crosses into high quality shooting areas. Does this mean they’re not effective teams purely because they’re crossing the ball? No. What it means is they’re using their skill and technical ability to fashion better scoring opportunities for their colleagues than the average Premier League footballer. Moreover, it also doesn’t take into account where the ball ends up once a team ‘clears’ the ball – it could result in a corner, which you score from, or it falls to an attacking player who scores or assists because the cross led to a disorganised defence – these things may not be easily measured, but should definitely come into the conversation about whether or not crossing is a good/bad attacking ploy. The clips below show how a good cross can create a high quality chance, even if it isn’t directly scored from:

Good Cross 1

Good Cross 2

Alternatively, you beat the clogged up Zone 14 by never letting the opposition get into position. Again, something Leicester and Athletico do very well. Their use of the counter attack, and the long ball, are devastating because most teams nowadays only seem to be able to defend against pale imitations of ‘tiki taka’ football.

Griezmann Goal

One tactic relies on highly skilled players, the other relies on pace, directness, and a high percentage of shots on target from relatively few opportunities. This doesn’t mean to say that one is better than the other; it’s just a different way of combating new defensive structures. You need to tailor your analysis and your game plans to the players at your disposal – again, think back to the Messi vs Vardy comparison. If we’re going to get even more into the analysis, what about the effect of playing in the two ways I’ve just described?

Well, and huge thanks to statsbomb for being an incredible source of football knowledge, we know the most effective areas to score from (shown below using an expected goals (XG) method of evaluating shot quality). Luckily, Ted Knutson of recent Soccer AM fame (the author of the article I’ve linked to, and the source of the images below) is a clever bloke who understands the need for context and says “All shots in football are NOT equal” in an article about PDO. This is an important point to make when using XG data. Just because you shoot from an area that generally has a lower expected goals doesn’t mean it’s a bad thing to do. It also doesn’t mean that shooting from an area of higher expected goals is the best thing to do. It relies on context – and that you can get from video.

You could always include a subjective measurement of shot quality, or objectively include the number of players in between the shooter and the goal to augment the XG outcome; it would be time consuming I know, but surely someone at a Premier League club could do it for their own players if they didn’t fancy doing it for every team in the league? Luckily that’s for better analysts than I to sort out! To visualise this point, which of the shots below is of better quality taking into consideration the number of players in the way?

Screen Shot 2016-05-17 at 18.49.53

Anyway, the point remains that sometimes the numbers don’t tell the whole story. I remember reading a paper on Aussie Rules (a passion of mine after spending time living in the country playing cricket and studying), and they found that increasing the number of rebound 50s (a rebound 50 is when a defensive player moves the ball out of their defensive 50 metre area into midfield) had a negative correlation with points scored.

From a purely statistical point of view, doesn’t that mean teams should keep the ball in their defensive area to ensure they win? Well, obviously it doesn’t. It’s just an indicator of who was on top in the game – more rebound 50s = more time spent defending, thus less chance of winning the game. Anyone with even a foreigners understanding of the game can work that out. Scale that up and, while I think there are some wonderful bits of statistical research being completed within football, the true implications aren’t being reported – there’s something lost in translation.

I’ve always worked on the KISS principle – Keep It Simple, Stupid! But from a football watchers point of view, I don’t see many teams doing this. I see them getting lost, unable to work out how to beat these compact defences with the talent they have in their squad; devising plans that their players cannot possibly implement, but going for it anyway. I get the feeling that managers aren’t being told the meaning behind the numbers, just the numbers themselves and that isn’t good practice.

The numbers provide competitive edges and analysis department will only continue to increase in importance if used correctly; and if you want any more evidence from a much more well informed source – here’s a link to another great Ted Knutson article to help you.

To summarise, stats in football are great so long as the people using them know how to do it properly. If all you can tell people are the numbers, as opposed to how you apply them to the real footballing world, you may as well not bother doing the analysis at all. What I currently see as a “one size fits all” approach is not what’s needed to turn great numbers into great on-field results. It’s a coherent plan, with numbers used in conjunction with knowledge of football.

UPDATE – Not long after I posted this, another Ted Knutson piece was posted online; with this quote seeming to say exactly what I’m getting at (although it’s in reference to managers it’s still relevant):

“The big thing to take away from it is whether a particular coach fits the style of play your club wants to play and/or how your personnel might fit into their new style of play. If you have to bin half your squad simply by hiring a new head coach, maybe you want to look a bit closer at some other coaches whose style doesn’t require quite so much immediate, expensive change.

The Case for Hales

The trigger for writing this came from a Guardian article about Alex Hales’ ability to adapt to Test Match cricket; more specifically it was this quote:

“At Test level the best players generally bat the same way in every innings without bending too much to the situation; they know exactly how they will score runs and how aggressive they will be.”

It struck a chord with me, in as much as visually the top players seem to do the same thing every time they bat. Jacques Kallis was often derided by the South African media for this stubborn approach to playing the game – especially if his predictable tempo meant that South Africa hadn’t scored runs quickly enough to set up a winning position.

But, using stats, can we prove that Hales isn’t consistent enough compared to the other players in the England Performance Squad? Firstly that requires a measure of consistency, which I believe is best described (in a cricketing sense) using a player’s strike rate, percentage of runs scored in boundaries, balls faced, and runs scored. There are, of course, some caveats to that.

If we’re trying to assess a players consistency, should we use every innings a player has – and thus including their golden ducks (when they get out to their first delivery) – or is it better to use innings’ that reach a specific number of deliveries/runs in a similar way to excluding players from statistical groups if they haven’t reach a certain amount of minutes played in football?

As we’re talking about opening batsmen it would be difficult to include all of their lowest scores. This is because they often face the most difficult conditions in the match – fresh bowlers, a new ball, and the added psychological pressure of opening.

In this instance, I’ve chosen to discard all innings where the batsman hasn’t reached 30. The rationale for this is that there is an expectation in timed cricket (cricket lasting longer than one day) that once a player reaches 30 they should be able to capitalise on that start and go on to score a large quantity of runs.

Another proviso with the stats is that I’ve generated an average for all innings of 30 or more runs. However, this hasn’t been done in the traditional way where the average is runs scored divided by the number of times dismissed. That may be a fair reflection of one’s ability to score runs without being dismissed, but it removes game context.

For me, if you average 100 runs because you bat low down the order and are rarely out then that average is worthless. By the same token, if you continually score 100s but do so in such a slow way that your team never has a chance of winning, why should you be rewarded for that? I’ve taken the view that by removing all scores of less than 30, it is only right to treat each innings as equal and thus the average is created with total runs divided by total innings.

NB – I’ve used data from 2012 to 2015 in the County Championship and I’ve also included Mark Stoneman, as there was a clamour to get him involved in the England squad, and Adam Lyth because he was the most recent England opener not included in the EPS.

After all that, we’ll jump right in.

The box and whisker plot shown below is for Strike Rate. It shows the minimum and maximum values (the top and bottom “whiskers”), the upper and lower quartiles (the top and bottom of the box), and the mean (the thicker black line). But how do we interpret it?

Well, the first thing to recognise is that the closer together the plot is, the more consistent a player is in terms of their strike rate – a measure of how quickly they score their runs. The first thing to note is how consistent Zafar Ansari is; the man is unflappable compared to the others. We know this because his interquartile range is the smallest, as is the absolute range. Mark Stoneman, on the other hand, varies his strike rate massively compared to the others. His interquartile range is the largest, and he also has two huge outliers (instances greater than the 75th percentile + 1.5 * IQR). Of the 5 batsmen, I would suggest that Ansari is the most consistent, Stoneman the most inconsistent, and Hales the second most inconsistent.

Strike Rate Openers]

It’s a good first step, but it’s only one measure. What other ways are there of assessing Hales’ performances?

Hales is an opening batsmen, and as an opener myself I know that your main job is to ‘see off’ the new ball. In essence, you need to face as many deliveries as you can, to the point where it stops swinging and runs are scored more easily. So, with another box and whisker plot, we can see that Ansari is again top of the pile in consistency. His minimum balls faced is much higher than the other players, and his maximum balls faced is also a lot higher than anyone else. While Ansari’s IQR is the largest in this selection of players, because his minimum value is so high I’ve decided not to hold this against him – an opener’s job is to face a lot of deliveries and he clearly does this consistently. Unfortunately for him, Mark Stoneman is again the least consistent performer with the lowest average balls faced, the lowest minimum value, and similar maximum value to those in third (Hales) and fourth place (Lyth).

Balls Faced Openers

This still doesn’t help us reach a final conclusion, as Hales has finished smack bang in the middle of the pack in terms of how many deliveries he can face.

Well what about how many runs he scores? Surely that’s the only measure that matters? The more runs a player scores, the better – that’s an easy one. But remember that this is based on all innings of 30 or greater, so it’s also a test of a player’s ability to capitalise once they’ve gotten past the hard bit – the new ball.

You can see that Alistair Cook has the smallest IQR, and the smallest gap between his minimum runs scored and lower quartile. Cook’s upper quartile is roughly 130, and includes an outlier at circa 180. This suggests that not only is Cook consistent in his run scoring, but when he scores runs, he scores heavily. But what about Hales? Well, he has by far the greatest range, and the highest maximum value for runs scored (as well as three outliers). His IQR, however, is the fourth largest, again only Mark Stoneman’s IQR is larger.

This isn’t as much of an issue when you see that his average is by far the highest, close to 75, in all innings when he reaches 30. We can state with some conviction that if Hales gets going in a Test Match this summer he should, based on previous performance, go big and make some hundreds.

Runs Scored Openers

Finally, we’re looking at the percentage of runs scored in boundaries. This is something referenced in the Guardian article, with the criticism of Hales being that he often can’t decided whether to attack or defend. Well, Alistair Cook is the most consistent performer by this measure – his IQR is by far the smallest, and this equates well with what a lot of commentators and journalists say when asked him. He is Mr Dependable, and knows his game so well that he often scores from only three shots (the cut, pull and a ‘punchy’ drive through the off side). Hales, on the other hand, has a marginally better IQR and overall range than Mark Stoneman – leaving Hales in fourth spot again.

Percentage of Boundaries Openers]

All of this information comes back to one question: “So what?”

Well, in my opinion, he’s a marginally more consistent performer than Mark Stoneman but Zafar Ansari is probably the best option if you’re looking for consistent performance. However, that isn’t the end of it. What about the make up of the team? The need for a fast scorer to blend well with Alistair Cook’s methodical approach is vital when we talk about Hales’ inclusion. Trevor Bayliss is keen to have two stroke makers in the top three and you can see why they’ve kept faith with Hales.

His upside is so great that giving him the time to find his feet at international cricket is likely to pay huge dividends in the future. While it may take longer than Cook, Ansari, and Lyth for Hales to make his first century for England, don’t be surprised if he turns in a 150+ score or if he hits centuries in consecutive Tests. Yes we can prove he’s less consistent, but the potential to play match-winning innings is something you simply cannot ignore.