Saturday, August 13, 2011

The great indian batting line up of 2000s

India boast about their great batting line which is studded with Virender Sehwag (Runs: 7694, Avg. 52.69, 100s: 22, 50s: 27), Rahul Dravid (Runs: 12616, Avg. 52.56, 100s: 34, 50s: 60), Sachin Tendulkar (Runs 14851, Avg. 56.25, 100s 51, 50s 60) , Sourav Ganguly (Runs: 7212, Avg. 42.17, 100s: 16, 50s: 35), VVS Laxman (Runs: 8302, Avg.: 46.64, 100s: 16, 50s: 54).


In the 2000s India routinely fielded at least three of these five players in their batting line up and on occasions they all played. So I looked into the data when at least 3, 4 or all five played in a match and checked how much they each scored. Specifically I was interested in knowing when they all got to bat (three, four or five of them), did they all manage to score at least 50. The only way a team can benefit from its stars is if they all bat well together, otherwise individual stars are not enough to brighten the prospects of a team (of course Lara can do but even he managed to win only few Test matches on his own).

In the three figures below I have separated the data into first and second innings of Indian batting. This does not refer to the innings in the match. The gray boxes means that the corresponding player played in the match but did not get to bat in that innings. Light blue box means that the corresponding player made less than 50 runs and dark blue color means that the player made more than 50 runs. 

For collaborative batting efforts we are interested in knowing how often and how regularly at least three of then scored 50+ score in the same inning. The number are shown on the right of each subplot.

When at least three of them played together only in 11.43% cases all three managed to score 50+ runs, which is clearly a small fraction. Even more surprising is the fact that only 25% times two of the three made 50+ scores. These number drop in the second inning when only in 6% cases all three made 50+ scores.

Performance of five most successful modern Indian batsmen when at least three of them played together.

When at least four of them played together, in the first innings they never made 50+ score together (figure below). In 31% cases two of them managed to get 50+ in the first inning. The second innings remained a solo performance case when in 41% cases at least one scored a 50+.

Performance of five most successful modern Indian batsmen when at least four of them played together.
When all the five played together, media boasted about the strength of the Indian batting but never did all five made 50+ together. Playing all of them at the same time meant that at least two of them scored 50+ in about 40% cases in the first innings. Second innings again showcased individual performances.

Performance of five most successful modern Indian batsmen when all five played together.

This analysis shows that even though India has produced some very good batsmen in last two decades or so, but then have only rarely clicked together. I am not asking all of them to score runs in the same match but it is expected that at least three of the five score in the same inning, and unfortunately this has happened only 3% times in the first inning and 12% times in the second innings... but certainly only often enough that India could really exploit the batting abilities of these genius who as of now seem to like to work alone....to increase their individual scores and records...

right arm over
Arvind

Thursday, August 11, 2011

Fall of wicket and the Nelson effect in cricket

In cricket there is a superstition including many others, that at the score of 111 or its multipels there is an increased chance of fall of wicket. The number 111 is called the 'Nelson Figure' in cricket. The legend says that in the World War II there was a army officer  Lord Nelson who lost one arm, one leg and one eye. I am not sure whether this office played cricket or was related to cricket in some way but his situation became an expression in cricket to show score of 111. The famous umpire David Shepherd made this number even more special by his one leg jump when the score reached 111.

In the third test match between India and England on the second day, when Cook and Strauss were trouncing the hapless Indian bowling attack there was a discussion on cricinfo about probability of fall of wicket on 111. So I looked in to the partnership data over last 1950 test matches.

The score at the fall of first wicket seems to follow an exponential distribution. The figure below shows that probability to score a certain score before the loss of first wicket. The different lines show this distribution for different innings (top plot). This figure shows that there is a about 0.6% chance that the wicket will fall at the score of 50.
Fraction to score a certain number of runs before the fall of first wicket.

Because of the nature of data it makes sense to plot these graphs in a log-linear plot (lower-plot). In such plots a straight line indicates an exponential distribution. The pale blue line corresponds to an exponential function. Largely the nature of exponential does not change with inning and the behavior remains similar.

Therefore, there is no data that suggests that there an increased chance for the fall of the first wicket at 111.

Next, we can look for runs scored for each wicket in different innings in the same way. In the figure below I chose the log-linear representation because that more revealing. Different colors belong to different wickets with first wicket being blue and 10th wicket being the red line. Different subplots are for different innings.

Inning by inning fraction of fall of wickets as a function of runs scored.

Naturally the lines have a higher slope for lower order wickets but largely the distribution is exponential. That is that fall of wicket, when looked over all the matches, all the conditions, all bowler and batsmen, turns out to be a random process. Exponential distribution is a simplification and more sophisticated distributions can be used to better describe this phenomenon. Maybe I will do it some other time, when I am sitting in a boring lecture, like now...

see also
http://en.wikipedia.org/wiki/Nelson_%28cricket%29

Wednesday, May 4, 2011

Greatness of Sachin Tendulkar lies in his longevity

What makes Sachin Tendulkar a truly great player? There is no objective answer to this question. Every cricketer and cricket fan will have their own answers and augments.  In this age of data we have a better chance to identify components of Sachin's greatness. Below I provide one such analysis.

I have plotted few descriptors of batting for 50 top run getter in both Test matches and ODIs. The seven descriptors I am using are
Not Out [panel A]
Cumulative Runs [panel B]
Highest Score [panel C]
Batting Average [panel D]
Centuries [panel E]
Fifties [panel F]
Zeros [panel G]

Specifically I am interested in knowing if there is any correlation between any of the above descriptors and number of matches played. It is kind of obvious that if you are a batsman then more matches you play more runs and centuries you will get. 

In the two figures below the top one belongs to ODI matches and the lower one belongs to Test matches. Each player is marked with a colored circle. The color for each player is mentioned on the far right.

Indeed cumulative Runs scored has a correlation of 0.89 (ODI) and 0.84 (Test) with number of matches played [panels B in the figures below].
Same holds for number of 50s with a correlation of 0.74 (ODI) 0.89 (Tests) [panels F in the figures below].

In test matches even number of 100s scored is highly correlated with number of matches (correlation 0.84).  This is not so true for ODI (correlation 0.57). This analysis shows that Sachin's centuries in test matches are more or less predicted by his longevity. However, his count of ODI centuries is indeed a signature of his class and greatness.

The analysis also shows that in ODIs there is a weak but significant correlation between longevity and batting average. Similar trend is seen in Test matches but that more because of Sir Don Bradman.

Other descriptors are not correlated with longevity.

In both the pictures Sachin's stats are marked with a green circle and a small blue circle. For a comparison in test matches I have marked Don Bradman with a blue circle.

In summary, Sachin's runs and centuries and fifities in Test matches and runs and fifties in the ODIs are just a simple case of statistics. Any of the player shown in the figure would have matched that feat. Sachin's true brilliance shows up in the fact that he has scored so many centuries in ODIs.

Overall, this data suggests that largely Sachin's greatness is nothing but his longevity.




right arm over
Arvind