The trigger for writing this came from a Guardian article about Alex Hales’ ability to adapt to Test Match cricket; more specifically it was this quote:

*“At Test level the best players generally bat the same way in every innings without bending too much to the situation; they know exactly how they will score runs and how aggressive they will be.”*

It struck a chord with me, in as much as visually the top players seem to do the same thing every time they bat. Jacques Kallis was often derided by the South African media for this stubborn approach to playing the game – especially if his predictable tempo meant that South Africa hadn’t scored runs quickly enough to set up a winning position.

But, using stats, can we prove that Hales isn’t consistent enough compared to the other players in the England Performance Squad? Firstly that requires a measure of consistency, which I believe is best described (in a cricketing sense) using a player’s strike rate, percentage of runs scored in boundaries, balls faced, and runs scored. There are, of course, some caveats to that.

If we’re trying to assess a players consistency, should we use every innings a player has – and thus including their golden ducks (when they get out to their first delivery) – or is it better to use innings’ that reach a specific number of deliveries/runs in a similar way to excluding players from statistical groups if they haven’t reach a certain amount of minutes played in football?

As we’re talking about opening batsmen it would be difficult to include all of their lowest scores. This is because they often face the most difficult conditions in the match – fresh bowlers, a new ball, and the added psychological pressure of opening.

In this instance, I’ve chosen to discard all innings where the batsman hasn’t reached 30. The rationale for this is that there is an expectation in timed cricket (cricket lasting longer than one day) that once a player reaches 30 they should be able to capitalise on that start and go on to score a large quantity of runs.

Another proviso with the stats is that I’ve generated an average for all innings of 30 or more runs. However, this hasn’t been done in the traditional way where the average is runs scored divided by the number of times dismissed. That may be a fair reflection of one’s ability to score runs without being dismissed, but it removes game context.

For me, if you average 100 runs because you bat low down the order and are rarely out then that average is worthless. By the same token, if you continually score 100s but do so in such a slow way that your team never has a chance of winning, why should you be rewarded for that? I’ve taken the view that by removing all scores of less than 30, it is only right to treat each innings as equal and thus the average is created with total runs divided by total innings.

NB – I’ve used data from 2012 to 2015 in the County Championship and I’ve also included Mark Stoneman, as there was a clamour to get him involved in the England squad, and Adam Lyth because he was the most recent England opener not included in the EPS.

After all that, we’ll jump right in.

The box and whisker plot shown below is for Strike Rate. It shows the minimum and maximum values (the top and bottom “whiskers”), the upper and lower quartiles (the top and bottom of the box), and the mean (the thicker black line). But how do we interpret it?

Well, the first thing to recognise is that the closer together the plot is, the more consistent a player is in terms of their strike rate – a measure of how quickly they score their runs. The first thing to note is how consistent Zafar Ansari is; the man is unflappable compared to the others. We know this because his interquartile range is the smallest, as is the absolute range. Mark Stoneman, on the other hand, varies his strike rate massively compared to the others. His interquartile range is the largest, and he also has two huge outliers (instances greater than the 75^{th} percentile + 1.5 * IQR). Of the 5 batsmen, I would suggest that Ansari is the most consistent, Stoneman the most inconsistent, and Hales the second most inconsistent.

It’s a good first step, but it’s only one measure. What other ways are there of assessing Hales’ performances?

Hales is an opening batsmen, and as an opener myself I know that your main job is to ‘see off’ the new ball. In essence, you need to face as many deliveries as you can, to the point where it stops swinging and runs are scored more easily. So, with another box and whisker plot, we can see that Ansari is again top of the pile in consistency. His minimum balls faced is much higher than the other players, and his maximum balls faced is also a lot higher than anyone else. While Ansari’s IQR is the largest in this selection of players, because his minimum value is so high I’ve decided not to hold this against him – an opener’s job is to face a lot of deliveries and he clearly does this consistently. Unfortunately for him, Mark Stoneman is again the least consistent performer with the lowest average balls faced, the lowest minimum value, and similar maximum value to those in third (Hales) and fourth place (Lyth).

This still doesn’t help us reach a final conclusion, as Hales has finished smack bang in the middle of the pack in terms of how many deliveries he can face.

Well what about how many runs he scores? Surely that’s the only measure that matters? The more runs a player scores, the better – that’s an easy one. But remember that this is based on all innings of 30 or greater, so it’s also a test of a player’s ability to capitalise once they’ve gotten past the hard bit – the new ball.

You can see that Alistair Cook has the smallest IQR, and the smallest gap between his minimum runs scored and lower quartile. Cook’s upper quartile is roughly 130, and includes an outlier at circa 180. This suggests that not only is Cook consistent in his run scoring, but when he scores runs, he scores heavily. But what about Hales? Well, he has by far the greatest range, and the highest maximum value for runs scored (as well as three outliers). His IQR, however, is the fourth largest, again only Mark Stoneman’s IQR is larger.

This isn’t as much of an issue when you see that his average is by far the highest, close to 75, in all innings when he reaches 30. We can state with some conviction that if Hales gets going in a Test Match this summer he should, based on previous performance, go big and make some hundreds.

Finally, we’re looking at the percentage of runs scored in boundaries. This is something referenced in the Guardian article, with the criticism of Hales being that he often can’t decided whether to attack or defend. Well, Alistair Cook is the most consistent performer by this measure – his IQR is by far the smallest, and this equates well with what a lot of commentators and journalists say when asked him. He is Mr Dependable, and knows his game so well that he often scores from only three shots (the cut, pull and a ‘punchy’ drive through the off side). Hales, on the other hand, has a marginally better IQR and overall range than Mark Stoneman – leaving Hales in fourth spot again.

All of this information comes back to one question: “So what?”

Well, in my opinion, he’s a marginally more consistent performer than Mark Stoneman but Zafar Ansari is probably the best option if you’re looking for consistent performance. However, that isn’t the end of it. What about the make up of the team? The need for a fast scorer to blend well with Alistair Cook’s methodical approach is vital when we talk about Hales’ inclusion. Trevor Bayliss is keen to have two stroke makers in the top three and you can see why they’ve kept faith with Hales.

His upside is so great that giving him the time to find his feet at international cricket is likely to pay huge dividends in the future. While it may take longer than Cook, Ansari, and Lyth for Hales to make his first century for England, don’t be surprised if he turns in a 150+ score or if he hits centuries in consecutive Tests. Yes we can prove he’s less consistent, but the potential to play match-winning innings is something you simply cannot ignore.