Friday, March 9, 2012

My tribute to Rahul Dravid: What if he debuted with Sachin

Rahul Dravid decided to call it a day and thus comes the end of an illustrious career. A lot has been written and more will be written in days to come, but those who have witnessed Rahul's batting know that no amount of runs can do justice to the man and his career. I often take the refuge of 'numbers' when it comes to define a career because memory gets clouded and lot of prejudices enter in our evaluation. 

This is specially true in the case of Rahul who all through his career remained in the shadow of Sachin Tendulkar. They say that he did not mind that, but what choice did he have when the whole country decided to remain mesmerized with Sachin (for few reasons of course).

Dravid's first test match somehow defined his career. A glorious knock of 95 at the Lords on his debut, but Ganguly another debutant scored a century and took all the limelight. This has been the fate of Rahul all through career. Judgement of Indian cricket masses is strongly clouded by their emotions and not one really pays attention to numbers. For some reason we treat Zaheer Khan a great bowler (in Indian context of course) but we forget to compare his record with Javagal Srinath, who is not considered a great bowler.  

So I tried to look through the number again compared Rahul Dravid with our benchmark of greatness Sachin Tendulkar when they played together. I hate to bring this comparison by just numbers so I made some graphics. I asked a simple question: when Sachin and Rahul batted together in an innings in a test match who scored more runs. I do not want to calculate averages, that tells nothing, I will only give you the raw data in a pictorial manner so you can judge with less emotions and more objectivity. Here is how to read the figure
Sachin Tendulkar vs Rahul Dravid.
Rahul is marginally better if at all when they play together.


Panel A: Innings by innings runs scored by Sachin Tendulkar. The dark line is smoothed version of the same data. 
Panel B: Innings by innings runs scored by Rahul Dravid. The dark line is smoothed version of the same data. 
Panel C: Probability distribution of runs scored by Sachin (brown line) and Rahul (blue line). This graph shows the probability of scoring some number  of runs in an innings. It turns out that there is no difference between the two when they batted in the same innings. Moreover, it is all noise, as the distribution is an exponential (thick gray line), characteristic of a class of stochastic processes (Poisson Process).  
Panel D: Scatter plot of runs scored by Sachin (Y-axis) and Rahul (X-axis) when they played together. If the point is above the gray line then Sachin scored more runs, if it is below the gray line Rahul scored more runs. It turns out that Rahul outscored Sachin 130 times while Sachin outscored Rahul 118 times. But there is not much difference in the runs they scored:  Rahul (12981) scored 395 runs more than Sachin  (12586). Ideally we would want that both score together but that was so rare.
Panel E: The scatter data in panel D is presented as probability in pseudocolors. 
Panel F: Innings by inning difference in the score of Rahul and Sachin. Positive number means Rahul scored more and negative number means Sachin scored more. 
Panel G: The distribution of run difference shown in panel E is a gaussian noise. 
Panel H: Autocorrelation of run scoring by Sachin (brown line) and Rahul (blue line). Once again, as revealed by panel C, the run scoring by the two players is a stochastic process. Although for Rahul there is some increased chance of repeating his performance (failure or success) every 5 matches (I am not sure if it is significant). 

What does this analysis tell us? Because I considered the matches when they played together, I automatically corrected by all the conditions, bowlers (they bat at no. 3 and 4) and other effects. SO when we look this analysis I do not see any difference between Sachin and Rahul. If I have to chose perhaps I will go for Rahul given that he scored 395 more runs. But I may not play them together because chances that they both will score good amount of runs is rather rare.

This great similarity between the two giants of Indian cricket and or World Cricket in 2000s makes me ask why Rahul remained in Sachin's shadow? What if Rahuls had debuted slightly before Sachin Tendulkar, would be worshipping Rahul and not Sachin? 

Perhaps Sachin is so popular for his early start and his explosive batting in limited over cricket, something Rahul was never allowed play like Sachin. I will do that analysis some other day. But here is a humble request, please, respect Rahul for what he has achieved, his numbers tell a very good story and they very conclusively say if at all Rahul Dravid is better than Sachin Tendulkar. Rahul is a humble man when he says that the next generation is well equipped, we all know the vacuum he is leaving at the number 3.

right arm over
Arvind

Wednesday, February 8, 2012

The myth of bowling in the right areas

Whenever a bowler is hit for a boundary modern cricket commentators shower all kinds of cliches to describe the shot and when they seem to return to their senses, they make a sorry remark on the bowler that he should be bowling in the right areas. With batting dominating, 'bowling in the right areas' is becoming the new cliche contemporary cricket commentary. It certainly gives the impression that the commentator knows what these so called right areas are. But unfortunately, they never make any sensible suggestion, so I assume that like the poor bowler and his captain, they also do not have any idea of where the 'right area' might be on the pitch. 
Ranji invented leg-glance to deal with the leg-side bowling
Lets try to narrow down on these 'right areas' by first isolating the bad areas and start with the 'bad line to bowl'. Down the leg side is never considered good. This is simple because a slight error and the delivery will be a wide. Traditionally, umpires do not give LBW to balls pitching outside or even in the line of the leg stump. Finally, there may still be some stigma of the bodyline series. Then too much outside of the off stump is also not good, because it gives enough room to the batmen to play the shot. Moreover, the batsmen can decide to leave the ball. So unless the bowler can move the ball in, there is not much point bowling outside the off stump.

Next, the length of the ball. Bowling too full or too short is bad. Usually they recommend 'good length' which is something like two third of the length of the pitch. Short balls without much pace only invite well executed pull shots. Too full is rather easy to play if it is not combined with swing.

So, is that it then, that bowl around the off-stump at about two-third of the length, the so called 'corridor around the off and the middle stump'. But I am sure all bowlers know this. This is what they practice. This is what the coaches train for.

However, it is not true that a ball bowled in the 'right areas' is going to give you a wicket or at least will trouble the batsman every time you bowl it. Evidently, balls pitched well within the 'right areas' are smacked for easy boundaries and at times even balls pitched in the so called 'bad areas'  get you a prized wicket. In fact, I did a little survey myself on the hawk-eye data that is available for some matches on cricinfo website to confirm this.

So if you have played cricket at any serious level you know that in reality there are no right areas.  Once a batsman knows what the bowler is going to do, he can execute any shot on any delivery. So the biggest enemy of the bowler is to become predictable.

Along the same lines, there is no perfect ball which can give you a wicket every time you bowl it. You may get some success on few occasions but soon batsmen will have a strategy to play that ball for maximum score. The innovation of leg-glance by Ranjit Sinhji was first such instance and development of a number of new shots in recent times, including the 'switch-hit', gives clear indications that batsmen can come up with an antidote to any delivery given some time. 
Glenn surprised everyone. 
He bowled not just in the corridor 
but in the corridor of uncertainly


So in my experience the 'right bowling areas' is a myth created by our modern cricket commentators, who want to sound educated in cricket. The biggest enemy of a bowler is the monotony of his bowling, no matter how elegant it appears from the commentary area. The biggest ally a bowler can have in the middle of the bowling spell is the variability and a sense of surprise in his bowling. 


Unfortunately, this part of bowling never makes it to the statistics and we think that Glenn McGrath was a great bowler because he consistently bowled in the corridor of the uncertainty, but I think you should watch him again. And not just Glenn, pick videos of any successful bowler and you will find 'unpredictability' was the main weapon in their armor.

right arm over
Arvind

Tuesday, January 3, 2012

Making of a Test bowler: Time it takes to find your feet

Interesting times are back in Cricket. Debutant bowlers since September 2011 have surprised the batmen and as many as six new entrants (four of those are fast bowlers) have started their careers with a five fours. In fact the 12 debutant bowlers of in 2011 have shared as many as 18 five fors. Although data is not sufficient, but this new breed of bowlers seem to bring a hope that finally balance between the ball and the bat will be restored (see Number Game). What is really encouraging is that six of these new bowlers are less than 22 years of age. So I think there are interesting times ahead for Test Cricket where runs will not be easy, or at least that what I would like to happen.


To appreciate the importance these superb performances by the debutant bowlers we (together with Ajit Padmanabhan) looked at the probability to take a certain number of wickets in first four innings. To this end we looked at the bowlers who took at least 200 wickets and bowled at least 71 inning (otherwise the great Clarrie Grimmett would be left out and we dont want that). This way we have 57 bowlers for this analysis.


So we asked how these obviously successful bowlers performed early on in their career. In the figure below we have the probability of taking a certain number of wickets in first four innings of a bowler who finished his career with at least 200 wickets (left panel). The probability to take a certain number of wickets decreases as we increase the wickets count. This decrease is almost linear. But more data may change the picture. The figure also shows that for some of the very successful bowlers the probability to take five wickets or more is only 7% (right panel). This really reveals how amazing  the success of the class of 2011 has been.
Figure 1. Left: Probability of taking a certain number of wickets in the first, second, third and forth career inning of a bowler. The linear decay of the probability traces shows how difficult it is to take more wickets in a test innings.
Right: Probability of taking 'n' or more wickets in the first four career innings. The dark line in both panels is average of the four innings. It turns out that there is only 7% chance to take five of more wickets in your first four outings as a Test bowler. Data taken from cricinfo. 


How much time you want to invest in a new bowler
In cricket early success or failure is no predictors of long term success. So how much time a team should invest in a bowler if he is not responding. We turned to the  numbers of the bowlers who took at least 200 wickets and extended our previous analysis. This time we averaged the innings by innings wickets of these bowler in the 200+ club. It turns out that, when averaged over 57 bowlers, it takes about 10 innings before the bowlers reach their steady state of taking on average 1.5 wickets per inning. 


Figure 2. Evolution of a Test bowler. The probability distribution shown in the figure 1 is color coded and shown for first 71 innings of 57 bowlers who have taken at least 200 wickets. Dark colors mean less probability and other way around for bright colors. The thick blue line is the mean of the probability distributions. The slow rise of the blue line indicates that test bowlers take some time to find their feet at the highest level of Cricket. After about 10 innings the line hovers around 1.5 indicating that after that bowlers stats dont vary much. It also shows that even the very successful bowlers on average take only about 2 wickets per innings. This consistent with the fact that most teams field 4-5 bowler who fight to take 10 wickets.
So we argue that the teams should look into their bowlers at least for 5 test matches before giving up on them. By the same token if a bowlers has not reached his steady state of 1.5 wickets by the fifth match, there is a little chance for him, statistically speaking.


right arm over
Arvind

PS: I was helped by Ajith Padmanabhan in collection and analysis of the data.



Saturday, August 13, 2011

The great indian batting line up of 2000s

India boast about their great batting line which is studded with Virender Sehwag (Runs: 7694, Avg. 52.69, 100s: 22, 50s: 27), Rahul Dravid (Runs: 12616, Avg. 52.56, 100s: 34, 50s: 60), Sachin Tendulkar (Runs 14851, Avg. 56.25, 100s 51, 50s 60) , Sourav Ganguly (Runs: 7212, Avg. 42.17, 100s: 16, 50s: 35), VVS Laxman (Runs: 8302, Avg.: 46.64, 100s: 16, 50s: 54).


In the 2000s India routinely fielded at least three of these five players in their batting line up and on occasions they all played. So I looked into the data when at least 3, 4 or all five played in a match and checked how much they each scored. Specifically I was interested in knowing when they all got to bat (three, four or five of them), did they all manage to score at least 50. The only way a team can benefit from its stars is if they all bat well together, otherwise individual stars are not enough to brighten the prospects of a team (of course Lara can do but even he managed to win only few Test matches on his own).

In the three figures below I have separated the data into first and second innings of Indian batting. This does not refer to the innings in the match. The gray boxes means that the corresponding player played in the match but did not get to bat in that innings. Light blue box means that the corresponding player made less than 50 runs and dark blue color means that the player made more than 50 runs. 

For collaborative batting efforts we are interested in knowing how often and how regularly at least three of then scored 50+ score in the same inning. The number are shown on the right of each subplot.

When at least three of them played together only in 11.43% cases all three managed to score 50+ runs, which is clearly a small fraction. Even more surprising is the fact that only 25% times two of the three made 50+ scores. These number drop in the second inning when only in 6% cases all three made 50+ scores.

Performance of five most successful modern Indian batsmen when at least three of them played together.

When at least four of them played together, in the first innings they never made 50+ score together (figure below). In 31% cases two of them managed to get 50+ in the first inning. The second innings remained a solo performance case when in 41% cases at least one scored a 50+.

Performance of five most successful modern Indian batsmen when at least four of them played together.
When all the five played together, media boasted about the strength of the Indian batting but never did all five made 50+ together. Playing all of them at the same time meant that at least two of them scored 50+ in about 40% cases in the first innings. Second innings again showcased individual performances.

Performance of five most successful modern Indian batsmen when all five played together.

This analysis shows that even though India has produced some very good batsmen in last two decades or so, but then have only rarely clicked together. I am not asking all of them to score runs in the same match but it is expected that at least three of the five score in the same inning, and unfortunately this has happened only 3% times in the first inning and 12% times in the second innings... but certainly only often enough that India could really exploit the batting abilities of these genius who as of now seem to like to work alone....to increase their individual scores and records...

right arm over
Arvind

Thursday, August 11, 2011

Fall of wicket and the Nelson effect in cricket

In cricket there is a superstition including many others, that at the score of 111 or its multipels there is an increased chance of fall of wicket. The number 111 is called the 'Nelson Figure' in cricket. The legend says that in the World War II there was a army officer  Lord Nelson who lost one arm, one leg and one eye. I am not sure whether this office played cricket or was related to cricket in some way but his situation became an expression in cricket to show score of 111. The famous umpire David Shepherd made this number even more special by his one leg jump when the score reached 111.

In the third test match between India and England on the second day, when Cook and Strauss were trouncing the hapless Indian bowling attack there was a discussion on cricinfo about probability of fall of wicket on 111. So I looked in to the partnership data over last 1950 test matches.

The score at the fall of first wicket seems to follow an exponential distribution. The figure below shows that probability to score a certain score before the loss of first wicket. The different lines show this distribution for different innings (top plot). This figure shows that there is a about 0.6% chance that the wicket will fall at the score of 50.
Fraction to score a certain number of runs before the fall of first wicket.

Because of the nature of data it makes sense to plot these graphs in a log-linear plot (lower-plot). In such plots a straight line indicates an exponential distribution. The pale blue line corresponds to an exponential function. Largely the nature of exponential does not change with inning and the behavior remains similar.

Therefore, there is no data that suggests that there an increased chance for the fall of the first wicket at 111.

Next, we can look for runs scored for each wicket in different innings in the same way. In the figure below I chose the log-linear representation because that more revealing. Different colors belong to different wickets with first wicket being blue and 10th wicket being the red line. Different subplots are for different innings.

Inning by inning fraction of fall of wickets as a function of runs scored.

Naturally the lines have a higher slope for lower order wickets but largely the distribution is exponential. That is that fall of wicket, when looked over all the matches, all the conditions, all bowler and batsmen, turns out to be a random process. Exponential distribution is a simplification and more sophisticated distributions can be used to better describe this phenomenon. Maybe I will do it some other time, when I am sitting in a boring lecture, like now...

see also
http://en.wikipedia.org/wiki/Nelson_%28cricket%29

Wednesday, May 4, 2011

Greatness of Sachin Tendulkar lies in his longevity

What makes Sachin Tendulkar a truly great player? There is no objective answer to this question. Every cricketer and cricket fan will have their own answers and augments.  In this age of data we have a better chance to identify components of Sachin's greatness. Below I provide one such analysis.

I have plotted few descriptors of batting for 50 top run getter in both Test matches and ODIs. The seven descriptors I am using are
Not Out [panel A]
Cumulative Runs [panel B]
Highest Score [panel C]
Batting Average [panel D]
Centuries [panel E]
Fifties [panel F]
Zeros [panel G]

Specifically I am interested in knowing if there is any correlation between any of the above descriptors and number of matches played. It is kind of obvious that if you are a batsman then more matches you play more runs and centuries you will get. 

In the two figures below the top one belongs to ODI matches and the lower one belongs to Test matches. Each player is marked with a colored circle. The color for each player is mentioned on the far right.

Indeed cumulative Runs scored has a correlation of 0.89 (ODI) and 0.84 (Test) with number of matches played [panels B in the figures below].
Same holds for number of 50s with a correlation of 0.74 (ODI) 0.89 (Tests) [panels F in the figures below].

In test matches even number of 100s scored is highly correlated with number of matches (correlation 0.84).  This is not so true for ODI (correlation 0.57). This analysis shows that Sachin's centuries in test matches are more or less predicted by his longevity. However, his count of ODI centuries is indeed a signature of his class and greatness.

The analysis also shows that in ODIs there is a weak but significant correlation between longevity and batting average. Similar trend is seen in Test matches but that more because of Sir Don Bradman.

Other descriptors are not correlated with longevity.

In both the pictures Sachin's stats are marked with a green circle and a small blue circle. For a comparison in test matches I have marked Don Bradman with a blue circle.

In summary, Sachin's runs and centuries and fifities in Test matches and runs and fifties in the ODIs are just a simple case of statistics. Any of the player shown in the figure would have matched that feat. Sachin's true brilliance shows up in the fact that he has scored so many centuries in ODIs.

Overall, this data suggests that largely Sachin's greatness is nothing but his longevity.




right arm over
Arvind

Saturday, May 15, 2010

Duckworth-Lewis Calculation Graphs for Different Duration Matches

I think one the best things to have happened to cricket in 1990s was the development of a rule to update the target score should there be an interruption and team batting cannot have full set of overs. Even more remarkable was that ICC who has a reputation for choosing wrong options always, accepted this rule. This rule was devised by two statisticians Frank Duckworth and Tony Lewis and appropriately named after them.

According to Duckworth and Lewis, cricket is about using resources (overs and wickets) to achieve a target set by an opposition who batted first. Therefore, in case of an interruption, both overs and wickets should be considered. 

Exact D-L calculations often become very cumbersome. Thus, in absence of sophisticated computer, most teams in small league and friendly matches devise some ad-hoc rule. Here I show the graphs that can be used to estimate the modified target after an interruption. Originally the D-L method was developed for 50 overs matches, but usually in small leagues and friendly games, matches are shorter, therefore I estimated the D-L calculations for 20,30,40 and 50 over games.

How to read and use the graphs
x-axis is number of over remaining. The different colors of the lines refer to number of wickets fallen. So estimate the scaling factor to update the target, you need to know how many overs have been bowled (i.e. x-axis), how many wickets of the team batting second have fallen, color of the line. Corresponding to the number of wickets fallen and over bowled, y-axis give a fraction, by which you have multiply the original target to get the updated target.


Imagine, you are playing a 40 over game and set a target of 220, and rain stops the play at 25 over and the team chasing has lost 6 wickets. What should be D-L score of the team batting second to win the game? So we draw a vertical line at 15 over, and find out where does it cross the green line, from that crossing draw a horizontal line and find out what value it corresponds to on the y-axis. In this example we get 43.6 on y-axis. This means that the opposition should have scored 56.4 % runs i.e.  220 *(100 - 43.6) = 123 runs. This number would go up if they had lost one more wicket, to 142 and so on.

Enjoy your own simplified graphical D-L update rule, but I do wish that your games go on uninterrupted...

right arm over
Arvind